geocode() geocodes addr vectors using Census TIGER address
features (see ?taf) by:
searching for a matching street (see
?match_addr_street), within the same ZIP code, also searching similar ZIP codes for a matching street if necessaryusing the address number to select the best address feature range and side of the street (even/odd), breaking ties on smallest width and spread
linearly interpolating a geographic point along the best range line based on the actual and potential range of address numbers
offsetting the interpolated point from the range line perpendicularly
Only matched input addresses return non-missing matched ZIP code and street
values. Missing or unmatched ZIP codes return missing matched ZIP code,
street, geography, and s2 cell values. If all ranges on the matched ZIP code
and street exclude the address number, only the geography and s2 cell values
return NA.
Usage
geocode(
x,
name_phonetic_dist = 1L,
name_fuzzy_dist = 2L,
match_street_type = c("exact", "compatible", "ignore"),
match_street_directional = c("exact", "swap", "ignore"),
zip_variants = TRUE,
zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
year = as.character(2025:2011),
version = "v1",
taf_install = TRUE,
taf_redownload = FALSE,
offset = 10L,
progress = interactive()
)
geocode_zip(
x,
offset = 10L,
name_phonetic_dist = 1L,
name_fuzzy_dist = 2L,
match_street_type = c("exact", "compatible", "ignore"),
match_street_directional = c("exact", "swap", "ignore"),
zip_variants = TRUE,
zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
year = as.character(2025:2011),
version = "v1",
taf_install = TRUE,
taf_redownload = FALSE,
progress_callback = NULL,
taf_check = TRUE
)Arguments
- x
an addr vector (
?as_addr)- name_phonetic_dist
integer; maximum optimized string alignment distance between
phonetic_street_key()of x and y to consider a possible match- name_fuzzy_dist
integer; maximum optimized string alignment distance between
@nameof x and y to consider a possible match- match_street_type
character; how to compare street pretype and posttype when selecting street candidates.
"exact"requires pretype to match pretype and posttype to match posttype;"compatible"treats blank type fields as unknown but rejects candidates when known type information conflicts;"ignore"does not use street type fields when selecting candidates.- match_street_directional
character; how to compare street predirectional and postdirectional when selecting street candidates.
"exact"requires predirectional to match predirectional and postdirectional to match postdirectional;"swap"also permits predirectional to match postdirectional and postdirectional to match predirectional;"ignore"does not use street directional fields when selecting candidates.- zip_variants
logical; fuzzy match to common variants of
xiny?- zip_variant
character vector; zipcode variant types to use when
zip_variantsisTRUE; see?zipcode_variant- year
integer, length one; vintage of TIGER addrfeat (address feature) files
- version
character, length one; major version of the package and taf dataset schema
- taf_install
logical; install missing county TAF files needed for input ZIP codes and selected ZIP code variants before geocoding? If
FALSE, geocoding proceeds with installed files only and warns when needed county files are missing.- taf_redownload
logical; re-download cached TIGER ZIP files when installing missing TAF counties?
- offset
number of meters to offset geocode from street line
- progress
logical; show a ZIP-code progress bar while geocoding?
- progress_callback
optional callback used internally by
geocode()to update progress after ZIP-code reference data is loaded- taf_check
logical; check for missing TAF counties? Used internally by
geocode()after checking once for the full input vector.
Value
A tibble with columns addr (the input addr vector),
matched_zipcode (character vector), matched_street (addr_street
vector), matched_geography (s2_geography point vector), and s2_cell
(s2_cell vector).
Details
geocode_zip() is the workhorse function and operates on addr vectors
with the same ZIP code; use geocode() to geocode an addr vector
with multiple ZIP codes by grouping them by ZIP code and processing
serially by default.
At a lower level, grouping addr vectors by ZIP code and applying
geocode_zip() facilitates more control (e.g., parallel processing).
If the mirai package is installed and mirai daemons have already been
configured by the caller, geocode() uses them for ZIP-code-level
parallel processing. Otherwise it falls back to sequential processing.
geocode() and geocode_zip() both download and install tiger address
features by county (?taf_install) as needed based on the input addr ZIP
codes (and possibly ZIP code variants). TAF install checks run before
reading TAF ZIP files so parallel geocoding workers do not try to download
county files at the same time.
Examples
x <- as_addr(voter_addresses()[1:100])
# for example purposes, only install one county
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
# and geocode without installing other counties
gcd <- geocode(x, taf_install = FALSE)
#> Warning: TAF files are missing for 23 county/counties needed for geocoding; proceeding with installed files only because taf_install = FALSE. Missing counties: 18161, 39017, 39135, 39113, 39165, 21067, 39025, 18025, .... Affected ZIPs: 45003 (plus1 from 45002), 45003 (sub5 from 45002), 45004 (sub5 from 45002), 45005 (sub5 from 45002), 45042 (sub4 from 45002), 45062 (sub4 from 45002), 45032 (sub4 from 45002), 40502 (swap from 45002), ....
# this is only for example purposes and usually not required; e.g.
if (FALSE) { # \dontrun{
gcd <- geocode(x)
} # }
gcd
#> # A tibble: 100 × 5
#> addr matched_zipcode matched_street matched_geography s2_cell
#> <addr> <chr> <addr_str> <s2_geography> <s2cel>
#> 1 3359 QUEEN CITY Ave… 45238 Queen City Ave POINT (-84.61106… 8841ca…
#> 2 1040 KREIS Ln CINCI… 45205 Kreis Ln POINT (-84.58899… 8841b6…
#> 3 9960 DALY Rd CINCIN… 45231 Daly Rd POINT (-84.52779… 88404b…
#> 4 413 VOLKERT Pl CINC… 45219 Volkert Pl POINT (-84.52570… 8841b4…
#> 5 8519 LINDERWOOD Ln … 45255 Linderwood Ln POINT (-84.31725… 8841a9…
#> 6 6361 BEECHMONT Ave … 45230 Beechmont Ave POINT (-84.38246… 8841ae…
#> 7 10466 ADVENTURE Ln … 45242 Adventure Ln POINT (-84.35959… 884053…
#> 8 3156 LOOKOUT Cir CI… 45208 Lookout Cir POINT (-84.42829… 8841ad…
#> 9 310 WYOMING Ave CIN… 45215 Wyoming Ave POINT (-84.46793… 88404d…
#> 10 118 SPRINGFIELD Pik… 45215 Springfield P… POINT (-84.47321… 88404d…
#> # ℹ 90 more rows
table(geocode_stage(gcd))
#>
#> none street_variant street range_variant range
#> 9 1 2 0 88
geocode_table(gcd)
#> # A tibble: 100 × 5
#> addr geocode_stage matched_zipcode matched_street s2_cell
#> <chr> <chr> <chr> <chr> <chr>
#> 1 3359 QUEEN CITY Ave CIN… range 45238 Queen City Ave 8841ca…
#> 2 1040 KREIS Ln CINCINNAT… range 45205 Kreis Ln 8841b6…
#> 3 9960 DALY Rd CINCINNATI… range 45231 Daly Rd 88404b…
#> 4 413 VOLKERT Pl CINCINNA… range 45219 Volkert Pl 8841b4…
#> 5 8519 LINDERWOOD Ln CINC… range 45255 Linderwood Ln 8841a9…
#> 6 6361 BEECHMONT Ave CINC… range 45230 Beechmont Ave 8841ae…
#> 7 10466 ADVENTURE Ln CINC… range 45242 Adventure Ln 884053…
#> 8 3156 LOOKOUT Cir CINCIN… range 45208 Lookout Cir 8841ad…
#> 9 310 WYOMING Ave CINCINN… range 45215 Wyoming Ave 88404d…
#> 10 118 SPRINGFIELD Pike CI… range 45215 Springfield P… 88404d…
#> # ℹ 90 more rows
leaflet::leaflet(wk::wk_coords(gcd$matched_geography)) |>
leaflet::addTiles() |>
leaflet::addCircleMarkers(lng = ~x, lat = ~y, label = ~feature_id)
# use mirai for parallel processing
if (FALSE) { # \dontrun{
mirai::daemons(2)
geocode(x)
mirai::daemons(0)
} # }