Skip to contents

Addresses are attempted to be matched to reference geographies using different methods associated with decreasing levels of precision in the order listed below. Each method generates matched s2 cell identifiers differently and is recorded in the match_method column of the returned tibble:

  1. ref_addr: reference s2 cell from direct match to reference address

  2. tiger_range: centroid of street-matched TIGER address ranges containing street number

  3. tiger_street: centroid of street-matched TIGER address ranges closest to the street number

  4. none: unmatched using all previous approaches; return missing s2 cell identifier

Usage

addr_match_geocode(x, ref_addr, ref_s2, county = "39061", year = "2022")

Arguments

x

an addr vector (or character vector of address strings) to geocode

ref_addr

an addr vector to search for matches in

ref_s2

a s2_cell vector of locations for each ref_addr

county

character county identifer for TIGER street range files to search for matches in

year

character year for TIGER street range files to search for matches in

Value

a tibble with columns: addr contains x converted to an addr vector, s2 contains the resulting geocoded s2 cells as an s2cell vector, match_method is a factor with levels described above

Details

Performance was compared to the degauss geocoder (see /inst/compare_geocoding_to_degauss.R) using real-world addresses in voter_addresses(). Match success rates were similar, but DeGAUSS matched about 5% more of the addresses. These differences are sensitive to the match criteria considered for DeGAUSS (here precision of 'range' & score > 0.7 or precision of 'street' & score > 0.55):

addr_matcheddegauss_matchednperc
TRUETRUE22471492.8%
FALSETRUE134075.5%
FALSEFALSE29931.2%
TRUEFALSE10190.4%

Among those that were geocoded by both, 97.7% were geocoded to the same census tract, and 96.6% to the same block group:

ct_agreebg_agreens2_dist_ptiles (5th, 25th, 50th, 75th, 95th)perc
TRUETRUE21717914.7, 24.3, 39, 68.9, 153.696.6%
FALSEFALSE480521.6, 39.2, 158.9, 5577.9, 16998.82.1%
TRUEFALSE273019.6, 28.6, 41.2, 94.8, 571.81.2%

Examples

set.seed(1)
addr_match_geocode(sample(voter_addresses(), 10),
                   codec::cincy_addr_geo()$cagis_address,
                   ref_s2 = codec::cincy_addr_geo()$cagis_s2)
#> Warning: NAs introduced by coercion
#> # A tibble: 10 × 3
#>                                            addr         s2 match_method
#>                                          <addr>      <dbl> <fct>       
#>  1            9475 Zola Court Harrison OH 45030 -6.13e-269 ref_addr    
#>  2    6420 Fair Oaks Avenue Cincinnati OH 45237 -6.17e-269 ref_addr    
#>  3         12092 3rd Avenue Cincinnati OH 45249 -6.19e-269 ref_addr    
#>  4     9749 Culpepper Court Cincinnati OH 45231 -6.17e-269 ref_addr    
#>  5      7123 Eastlawn Drive Cincinnati OH 45237 -6.17e-269 ref_addr    
#>  6 4320 Saint Dominic Drive Cincinnati OH 45238 -6.43e-269 ref_addr    
#>  7       234 Stetson Street Cincinnati OH 45219 -6.70e-269 ref_addr    
#>  8   1899 Langdon Farm Road Cincinnati OH 45237 -6.70e-269 ref_addr    
#>  9     2428 Fairview Avenue Cincinnati OH 45219 -6.70e-269 ref_addr    
#> 10       3684 Reemelin Road Cincinnati OH 45211 -6.74e-269 ref_addr