Skip to contents

A single addr in y is chosen for each addr in x. Matching is staged to reduce the search space: ZIP codes are matched first, street names are then matched within each matched ZIP code, and street numbers are finally matched within each matched street and ZIP code combination. If more than one candidate addr remains in y after these stages, the first candidate in y is returned.

Missing or empty address components that cannot be matched at any stage are left missing in the returned addr() values. Rows with a matched ZIP code but no street match return an addr with only @place@zipcode filled; rows with matched ZIP code and street but no number match also return the matched @street.

addr_match() accepts raw reference data and prepares it internally, which is the right default for one-off matching jobs. addr_match_prepare() becomes useful when the same reference y will be reused across multiple calls to addr_match(), because it caches the deduplicated reference addresses and ZIP/street/number candidate lookups once instead of rebuilding them on every call.

Preparing y once avoids recomputing unique(y), ZIP-code groups, and exact street/number candidate lookups each time you call addr_match() with the same reference addresses. For a single end-to-end match, preparing y explicitly does not remove that work; it only moves it outside addr_match().

Usage

addr_match(
  x,
  y,
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  name_phonetic_dist = 2L,
  name_fuzzy_dist = 1L,
  number_fuzzy_dist = 1L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  progress = interactive()
)

addr_match_prepare(y, progress = interactive())

Arguments

x

addr vector to match

y

addr vector to match against, or a prepared addr_match_index created by addr_match_prepare()

zip_variants

logical; fuzzy match to common variants of x in y?

zip_variant

character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between @name of x and y to consider a possible match

number_fuzzy_dist

integer; maximum optimized string alignment distance between addr_number strings in x and y to consider a possible match.

match_street_type

character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.

match_street_directional

character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.

progress

logical; show reference-preparation timing and a progress bar while preparing raw y or processing matched ZIP groups?

Value

An addr vector, the same length as x, containing the selected match in y for each addr in x. Partial matches are returned with matched ZIP code and/or street fields filled when later stages do not match.

Examples

the_addr <- nad_example_data(match_prepared = TRUE)
my_addr <- as_addr(
  c(
    "2700 Alice St 45222",
    "10623 Srpingfield Pike 45215",
    "173 Wuhlper Ave 45220",
    "12176 8th Ave 45249",
    "12176 7ht Ave 45249",
    "10 W 14th St 45202",
    "10 Oak Rd 45241"
  )
)

addr_match(my_addr, the_addr)
#> <addr>
#>  @ number: <addr_number> function ()  
#>  .. @ prefix: chr [1:7] "" "" "" "" "" "" NA
#>  .. @ digits: chr [1:7] "2700" "10623" "173" "12176" "12176" "10" NA
#>  .. @ suffix: chr [1:7] "" "" "" "" "" "" NA
#>  @ street: <addr_street> function ()  
#>  .. @ predirectional : chr [1:7] "" "" "" "" "" "W" ""
#>  .. @ premodifier    : chr [1:7] "" "" "" "" "" "" ""
#>  .. @ pretype        : chr [1:7] "" "" "" "" "" "" ""
#>  .. @ name           : chr [1:7] "ALICE" "SPRINGFIELD" "WOOLPER" "7TH" "7TH" "14TH" ...
#>  .. @ posttype       : chr [1:7] "St" "Pike" "Ave" "Ave" "Ave" "St" "Rd"
#>  .. @ postdirectional: chr [1:7] "" "" "" "" "" "" ""
#>  @ place : <addr_place> function ()  
#>  .. @ name   : chr [1:7] "CINCINNATI" "CINCINNATI" "CINCINNATI" "CINCINNATI" ...
#>  .. @ state  : chr [1:7] "OH" "OH" "OH" "OH" "OH" "OH" NA
#>  .. @ zipcode: chr [1:7] "45221" "45215" "45220" "45249" "45249" "45202" ...

addr_match(
  my_addr,
  the_addr,
  zip_variants = FALSE,
  name_phonetic_dist = 0L,
  name_fuzzy_dist = 0L,
  number_fuzzy_dist = 0L,
  match_street_type = "ignore",
  match_street_directional = "ignore"
)
#> <addr>
#>  @ number: <addr_number> function ()  
#>  .. @ prefix: chr [1:7] NA NA "" NA NA "" NA
#>  .. @ digits: chr [1:7] NA NA "173" NA NA "10" NA
#>  .. @ suffix: chr [1:7] NA NA "" NA NA "" NA
#>  @ street: <addr_street> function ()  
#>  .. @ predirectional : chr [1:7] NA NA "" NA NA "W" ""
#>  .. @ premodifier    : chr [1:7] NA NA "" NA NA "" ""
#>  .. @ pretype        : chr [1:7] NA NA "" NA NA "" ""
#>  .. @ name           : chr [1:7] NA NA "WOOLPER" NA NA "14TH" "OAK"
#>  .. @ posttype       : chr [1:7] NA NA "Ave" NA NA "St" "St"
#>  .. @ postdirectional: chr [1:7] NA NA "" NA NA "" ""
#>  @ place : <addr_place> function ()  
#>  .. @ name   : chr [1:7] NA NA "CINCINNATI" NA NA "CINCINNATI" NA
#>  .. @ state  : chr [1:7] NA NA "OH" NA NA "OH" NA
#>  .. @ zipcode: chr [1:7] NA "45215" "45220" "45249" "45249" "45202" "45241"

my_addr <- as_addr(voter_addresses()[1:100])

d <- addr_match(my_addr, the_addr)
d
#> <addr>
#>  @ number: <addr_number> function ()  
#>  .. @ prefix: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  .. @ digits: chr [1:100] "3359" "1040" "9960" "413" "8519" "6361" "10466" ...
#>  .. @ suffix: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  @ street: <addr_street> function ()  
#>  .. @ predirectional : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  .. @ premodifier    : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  .. @ pretype        : chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  .. @ name           : chr [1:100] "QUEEN CITY" "KREIS" "DALY" "VOLKERT" "LINDERWOOD" ...
#>  .. @ posttype       : chr [1:100] "Ave" "Ln" "Rd" "Pl" "Ln" "Ave" "Ln" "Cir" "Ave" ...
#>  .. @ postdirectional: chr [1:100] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ...
#>  @ place : <addr_place> function ()  
#>  .. @ name   : chr [1:100] "CINCINNATI" "CINCINNATI" "CINCINNATI" "CINCINNATI" ...
#>  .. @ state  : chr [1:100] "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" "OH" ...
#>  .. @ zipcode: chr [1:100] "45238" "45205" "45231" "45219" "45255" "45230" ...

addr_match_stage(d)
#>   [1] number number number number number number number number number number
#>  [11] number number number number number number number number number number
#>  [21] number number number number number number number number number number
#>  [31] number number number number number number number zip    number number
#>  [41] number number number number number number number number number number
#>  [51] number number number number number number number number number number
#>  [61] number street number number number zip    street number number number
#>  [71] number number number number zip    street zip    number number number
#>  [81] street number number number number number zip    number zip    number
#>  [91] street number number number number number number number number number
#> Levels: none < zip < street < number