A single addr in y is chosen for each addr in x. Matching is staged to
reduce the search space: ZIP codes are matched first, street names are then
matched within each matched ZIP code, and street numbers are finally matched
within each matched street and ZIP code combination. If more than one
candidate addr remains in y after these stages, the first candidate in y
is returned.
Missing or empty address components that cannot be matched at any stage are
left missing in the returned addr() values. Rows with a matched ZIP code
but no street match return an addr with only @place@zipcode filled; rows
with matched ZIP code and street but no number match also return the matched
@street.
addr_match() accepts raw reference data and prepares it internally, which
is the right default for one-off matching jobs. addr_match_prepare()
becomes useful when the same reference y will be reused across multiple
calls to addr_match(), because it caches the deduplicated reference
addresses and ZIP/street/number candidate lookups once instead of rebuilding
them on every call.
Preparing y once avoids recomputing unique(y), ZIP-code groups, and
exact street/number candidate lookups each time you call addr_match()
with the same reference addresses. For a single end-to-end match, preparing
y explicitly does not remove that work; it only moves it outside
addr_match().
Usage
addr_match(
x,
y,
zip_variants = TRUE,
name_phonetic_dist = 2L,
name_fuzzy_dist = 1L,
number_fuzzy_dist = 1L,
match_street_predirectional = TRUE,
match_street_posttype = TRUE,
match_street_pretype = TRUE,
match_street_postdirectional = FALSE,
progress = interactive()
)
addr_match_prepare(y, progress = interactive())Arguments
- x
addr vector to match
- y
addr vector to match against, or a prepared
addr_match_indexcreated byaddr_match_prepare()- zip_variants
logical; fuzzy match to common variants of x in y? (e.g., changing 4th or 5th digit)
- name_phonetic_dist
integer; maximum optimized string alignment distance between
phonetic_street_key()of x and y to consider a possible match- name_fuzzy_dist
integer; maximum optimized string alignment distance between
@nameof x and y to consider a possible match- number_fuzzy_dist
integer; maximum optimized string alignment distance between
@numberof x and y to consider a possible match- match_street_predirectional
logical; require street predirectional to match when selecting street candidates?
- match_street_posttype
logical; require street posttype to match when selecting street candidates?
- match_street_pretype
logical; require street pretype to match when selecting street candidates?
- match_street_postdirectional
logical; require street postdirectional to match when selecting street candidates?
- progress
logical; show reference-preparation timing and a progress bar while preparing raw
yor processing matched ZIP groups?
Value
an addr vector, the same length as x, that is the best match in y for each addr in x. Partial matches are returned with matched ZIP code and/or street fields filled when later stages do not match.
Examples
the_addr <- nad_example_data(match_prepared = TRUE)
my_addr <- as_addr(
c(
"2700 Alice St 45222",
"10623 Srpingfield Pike 45215",
"173 Wuhlper Ave 45220",
"12176 8th Ave 45249",
"12176 7ht Ave 45249",
"10 W 14th St 45202",
"10 Oak Rd 45241"
)
)
addr_match(my_addr, the_addr)
#> <addr>
#> @ number: <addr_number> function ()
#> .. @ prefix: chr [1:7] "" "" "" "" "" "" NA
#> .. @ digits: chr [1:7] "2700" "10623" "173" "12176" "12176" "10" NA
#> .. @ suffix: chr [1:7] "" "" "" "" "" "" NA
#> @ street: <addr_street> function ()
#> .. @ predirectional : chr [1:7] "" "" "" "" "" "W" ""
#> .. @ premodifier : chr [1:7] "" "" "" "" "" "" ""
#> .. @ pretype : chr [1:7] "" "" "" "" "" "" ""
#> .. @ name : chr [1:7] "ALICE" "SPRINGFIELD" "WOOLPER" "7TH" "7TH" "14TH" ...
#> .. @ posttype : chr [1:7] "St" "Pike" "Ave" "Ave" "Ave" "St" "Rd"
#> .. @ postdirectional: chr [1:7] "" "" "" "" "" "" ""
#> @ place : <addr_place> function ()
#> .. @ name : chr [1:7] "CINCINNATI" "CINCINNATI" "CINCINNATI" "CINCINNATI" ...
#> .. @ state : chr [1:7] "OH" "OH" "OH" "OH" "OH" "OH" NA
#> .. @ zipcode: chr [1:7] "45221" "45215" "45220" "45249" "45249" "45202" ...
addr_match(
my_addr,
the_addr,
zip_variants = FALSE,
name_phonetic_dist = 0L,
name_fuzzy_dist = 0L,
number_fuzzy_dist = 0L,
match_street_predirectional = FALSE,
match_street_posttype = FALSE,
match_street_pretype = FALSE,
match_street_postdirectional = FALSE
)
#> <addr>
#> @ number: <addr_number> function ()
#> .. @ prefix: chr [1:7] NA NA NA NA NA "" NA
#> .. @ digits: chr [1:7] NA NA NA NA NA "10" NA
#> .. @ suffix: chr [1:7] NA NA NA NA NA "" NA
#> @ street: <addr_street> function ()
#> .. @ predirectional : chr [1:7] NA NA NA NA NA "W" ""
#> .. @ premodifier : chr [1:7] NA NA NA NA NA "" ""
#> .. @ pretype : chr [1:7] NA NA NA NA NA "" ""
#> .. @ name : chr [1:7] NA NA NA NA NA "14TH" "OAK"
#> .. @ posttype : chr [1:7] NA NA NA NA NA "St" "St"
#> .. @ postdirectional: chr [1:7] NA NA NA NA NA "" ""
#> @ place : <addr_place> function ()
#> .. @ name : chr [1:7] NA NA NA NA NA "CINCINNATI" NA
#> .. @ state : chr [1:7] NA NA NA NA NA "OH" NA
#> .. @ zipcode: chr [1:7] NA "45215" "45220" "45249" "45249" "45202" "45241"
my_addr <- as_addr(voter_addresses()[1:100])
d <- addr_match(my_addr, the_addr)
d
#> <addr>
#> @ number: <addr_number> function ()
#> .. @ prefix: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> .. @ digits: chr [1:100] "3359" "1040" "9960" "413" "8519" "6361" "10466" ...
#> .. @ suffix: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> @ street: <addr_street> function ()
#> .. @ predirectional : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> .. @ premodifier : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> .. @ pretype : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> .. @ name : chr [1:100] "QUEEN CITY" "KREIS" "DALY" "VOLKERT" "LINDERWOOD" ...
#> .. @ posttype : chr [1:100] "Ave" "Ln" "Rd" "Pl" "Ln" "Ave" "Ln" "Cir" "Ave" ...
#> .. @ postdirectional: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#> @ place : <addr_place> function ()
#> .. @ name : chr [1:100] "CINCINNATI" "CINCINNATI" "CINCINNATI" "CINCINNATI" ...
#> .. @ state : chr [1:100] "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" ...
#> .. @ zipcode: chr [1:100] "45238" "45205" "45231" "45219" "45255" "45230" ...
addr_match_stage(d)
#> [1] number number number number number number number number number number
#> [11] number number number number number number number number number number
#> [21] number number number number number number number number number number
#> [31] number number number number number number number zip number number
#> [41] number number number number number number number number number number
#> [51] number number number number number number number number number number
#> [61] number street number number number zip street number number number
#> [71] number number number number zip street zip number number number
#> [81] street number number number number number zip number zip number
#> [91] street number number number number number number number number number
#> Levels: none < zip < street < number