Addresses are attempted to be matched to reference geographies using different methods
associated with decreasing levels of precision in the order listed below.
Each method generates matched s2 cell identifiers differently
and is recorded in the match_method
column of the returned tibble:
ref_addr
: reference s2 cell from direct match to reference addresstiger_range
: centroid of street-matched TIGER address ranges containing street numbertiger_street
: centroid of street-matched TIGER address ranges closest to the street numbernone
: unmatched using all previous approaches; return missing s2 cell identifier
Arguments
- x
an addr vector (or character vector of address strings) to geocode
- ref_addr
an addr vector to search for matches in
- ref_s2
a s2_cell vector of locations for each ref_addr
- county
character county identifer for TIGER street range files to search for matches in
- year
character year for TIGER street range files to search for matches in
Value
a tibble with columns: addr
contains x
converted to an addr
vector,
s2
contains the resulting geocoded s2 cells as an s2cell
vector,
match_method
is a factor with levels described above
Details
Performance was compared to the degauss geocoder (see /inst/compare_geocoding_to_degauss.R
) using
real-world addresses in voter_addresses()
.
Match success rates were similar, but DeGAUSS matched about 5% more of the addresses. These differences are
sensitive to the match criteria considered for DeGAUSS (here precision of 'range' & score > 0.7 or
precision of 'street' & score > 0.55):
addr_matched | degauss_matched | n | perc |
TRUE | TRUE | 224714 | 92.8% |
FALSE | TRUE | 13407 | 5.5% |
FALSE | FALSE | 2993 | 1.2% |
TRUE | FALSE | 1019 | 0.4% |
Among those that were geocoded by both, 97.7% were geocoded to the same census tract, and 96.6% to the same block group:
ct_agree | bg_agree | n | s2_dist_ptiles (5th, 25th, 50th, 75th, 95th) | perc |
TRUE | TRUE | 217179 | 14.7, 24.3, 39, 68.9, 153.6 | 96.6% |
FALSE | FALSE | 4805 | 21.6, 39.2, 158.9, 5577.9, 16998.8 | 2.1% |
TRUE | FALSE | 2730 | 19.6, 28.6, 41.2, 94.8, 571.8 | 1.2% |
Examples
set.seed(1)
addr_match_geocode(sample(voter_addresses(), 10),
codec::cincy_addr_geo()$cagis_address,
ref_s2 = codec::cincy_addr_geo()$cagis_s2)
#> Warning: NAs introduced by coercion
#> # A tibble: 10 × 3
#> addr s2 match_method
#> <addr> <dbl> <fct>
#> 1 9475 Zola Court Harrison OH 45030 -6.13e-269 ref_addr
#> 2 6420 Fair Oaks Avenue Cincinnati OH 45237 -6.17e-269 ref_addr
#> 3 12092 3rd Avenue Cincinnati OH 45249 -6.19e-269 ref_addr
#> 4 9749 Culpepper Court Cincinnati OH 45231 -6.17e-269 ref_addr
#> 5 7123 Eastlawn Drive Cincinnati OH 45237 -6.17e-269 ref_addr
#> 6 4320 Saint Dominic Drive Cincinnati OH 45238 -6.43e-269 ref_addr
#> 7 234 Stetson Street Cincinnati OH 45219 -6.70e-269 ref_addr
#> 8 1899 Langdon Farm Road Cincinnati OH 45237 -6.70e-269 ref_addr
#> 9 2428 Fairview Avenue Cincinnati OH 45219 -6.70e-269 ref_addr
#> 10 3684 Reemelin Road Cincinnati OH 45211 -6.74e-269 ref_addr