geocode() geocodes addr vectors using Census TIGER address
features (see ?taf) by:
searching for a matching street (see
?match_addr_street), within the same ZIP code, also searching similar ZIP codes for a matching street if necessaryusing the address number to select the best address feature range and side of the street (even/odd), breaking ties on smallest width and spread
linearly interpolating a geographic point along the best range line based on the actual and potential range of address numbers
offsetting the interpolated point from the range line perpendicularly
Only matched input addresses return non-missing matched ZIP code and street
values. Missing or unmatched ZIP codes return missing matched ZIP code,
street, geography, and s2 cell values. If all ranges on the matched ZIP code
and street exclude the address number, only the geography and s2 cell values
return NA.
Usage
geocode(
x,
name_phonetic_dist = 1L,
name_fuzzy_dist = 2L,
match_street_predirectional = TRUE,
match_street_posttype = TRUE,
match_street_pretype = TRUE,
match_street_postdirectional = FALSE,
zip_variants = TRUE,
zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
year = as.character(2025:2011),
version = "v1",
taf_install = TRUE,
taf_redownload = FALSE,
offset = 10L,
progress = interactive()
)
geocode_zip(
x,
offset = 10L,
name_phonetic_dist = 1L,
name_fuzzy_dist = 2L,
match_street_predirectional = TRUE,
match_street_posttype = TRUE,
match_street_pretype = TRUE,
match_street_postdirectional = FALSE,
zip_variants = TRUE,
zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
year = as.character(2025:2011),
version = "v1",
taf_install = TRUE,
taf_redownload = FALSE,
progress_callback = NULL,
taf_check = TRUE
)Arguments
- x
an addr vector (
?as_addr)- name_phonetic_dist
integer; maximum optimized string alignment distance between
phonetic_street_key()of x and y to consider a possible match- name_fuzzy_dist
integer; maximum optimized string alignment distance between
@nameof x and y to consider a possible match- match_street_predirectional
logical; require street predirectional to match when selecting street candidates?
- match_street_posttype
logical; require street posttype to match when selecting street candidates?
- match_street_pretype
logical; require street pretype to match when selecting street candidates?
- match_street_postdirectional
logical; require street postdirectional to match when selecting street candidates?
- zip_variants
logical; fuzzy match to common variants of
xiny?- zip_variant
character vector; zipcode variant types to use when
zip_variantsisTRUE; see?zipcode_variant- year
integer, length one; vintage of TIGER addrfeat (address feature) files
- version
character, length one; major version of the package and taf dataset schema
- taf_install
logical; install missing county TAF files needed for input ZIP codes and selected ZIP code variants before geocoding? If
FALSE, geocoding proceeds with installed files only and warns when needed county files are missing.- taf_redownload
logical; re-download cached TIGER ZIP files when installing missing TAF counties?
- offset
number of meters to offset geocode from street line
- progress
logical; show a ZIP-code progress bar while geocoding?
- progress_callback
optional callback used internally by
geocode()to update progress after ZIP-code reference data is loaded- taf_check
logical; check for missing TAF counties? Used internally by
geocode()after checking once for the full input vector.
Value
A tibble with columns addr (the input addr vector),
matched_zipcode (character vector), matched_street (addr_street
vector), matched_geography (s2_geography point vector), and s2_cell
(s2_cell vector).
Details
geocode_zip() is the workhorse function and operates on addr vectors
with the same ZIP code; use geocode() to geocode an addr vector
with multiple ZIP codes by grouping them by ZIP code and processing
serially by default.
At a lower level, grouping addr vectors by ZIP code and applying
geocode_zip() facilitates more control (e.g., parallel processing).
If the mirai package is installed and mirai daemons have already been
configured by the caller, geocode() uses them for ZIP-code-level
parallel processing. Otherwise it falls back to sequential processing.
geocode() and geocode_zip() both download and install tiger address
features by county (?taf_install) as needed based on the input addr ZIP
codes (and possibly ZIP code variants).
Examples
x <- as_addr(voter_addresses()[1:100])
# for example purposes, only install one county
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
#> Warning: structured street name info not found for 1216 ranges
# and geocode without installing other counties
gcd <- geocode(x, taf_install = FALSE)
#> Warning: TAF files are missing for 23 county/counties needed for geocoding; proceeding with installed files only because taf_install = FALSE. Missing counties: 18161, 39017, 39135, 39113, 39165, 21067, 39025, 18025, .... Affected ZIPs: 45003 (plus1 from 45002), 45003 (sub5 from 45002), 45004 (sub5 from 45002), 45005 (sub5 from 45002), 45042 (sub4 from 45002), 45062 (sub4 from 45002), 45032 (sub4 from 45002), 40502 (swap from 45002), ....
# this is only for example purposes and usually not required; e.g.
if (FALSE) { # \dontrun{
gcd <- geocode(x)
} # }
leaflet::leaflet(wk::wk_coords(gcd$matched_geography)) |>
leaflet::addTiles() |>
leaflet::addCircleMarkers(lng = ~x, lat = ~y, label = ~feature_id)
# use mirai for parallel processing
if (FALSE) { # \dontrun{
mirai::daemons(2)
geocode(x)
mirai::daemons(0)
} # }