Left join two data frames using addr matching

addr_left_join() is a convenience wrapper around addr_match() that returns a left-join style result. It expands rows of x for duplicate rows in the original y that share the exact matched addr, but it does not return multiple distinct candidate addresses from y. addr_match() still selects a single best address before this wrapper expands exact duplicates.

Usage

addr_left_join(
  x,
  y,
  by = "addr",
  suffix = c(".x", ".y"),
  zip_variants = TRUE,
  zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
  name_phonetic_dist = 2L,
  name_fuzzy_dist = 1L,
  number_fuzzy_dist = 1L,
  match_street_type = c("exact", "compatible", "ignore"),
  match_street_directional = c("exact", "swap", "ignore"),
  progress = interactive(),
  match_prepared = NULL
)

Arguments

x, y: data frames or tibbles with an addr column
by: addr column name in x (and y if the same); or a length-2 character vector of c(x_col, y_col)
suffix: character vector of length 2 used to suffix duplicate columns
zip_variants: logical; fuzzy match to common variants of x in y?
zip_variant: character vector; zipcode variant types to use when zip_variants is TRUE; see ?zipcode_variant
name_phonetic_dist: integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match
name_fuzzy_dist: integer; maximum optimized string alignment distance between @name of x and y to consider a possible match
number_fuzzy_dist: integer; maximum optimized string alignment distance between addr_number strings in x and y to consider a possible match.
match_street_type: character; how to compare street pretype and posttype when selecting street candidates. "exact" requires pretype to match pretype and posttype to match posttype; "compatible" treats blank type fields as unknown but rejects candidates when known type information conflicts; "ignore" does not use street type fields when selecting candidates.
match_street_directional: character; how to compare street predirectional and postdirectional when selecting street candidates. "exact" requires predirectional to match predirectional and postdirectional to match postdirectional; "swap" also permits predirectional to match postdirectional and postdirectional to match predirectional; "ignore" does not use street directional fields when selecting candidates.
progress: logical; show addr_match() progress?
match_prepared: optional prepared addr_match_index for the y addr column, usually from addr_match_prepare(). When supplied, addr_left_join() validates that it is equivalent to the y addr column before reusing it for matching.

Value

A data frame with left-join semantics. Duplicate rows in y with the exact same matched addr are all returned. Partial ZIP-only or street-only matches do not expand to multiple candidate rows in y.

Examples

the_addr <- nad("Hamilton", "OH",
                refresh_binary = "no", refresh_source = "no")
my_addr <- tibble::tibble(
  addr = as_addr(voter_addresses()[1:100]),
  id = 1:100
)

d <- addr_left_join(
  my_addr,
  the_addr,
  by = c("addr", "nad_addr"),
  match_prepared = nad_example_data(match_prepared = TRUE)
)

d
#> # A tibble: 111 × 9
#>    addr            id nad_addr.y subaddress uuid  date_update s2    address_type
#>    <addr>       <int> <addr>     <chr>      <chr> <date>      <s2_> <chr>       
#>  1 3359 QUEEN …     1 3359 QUEE… NA         {E3A… 2025-03-30  -6.7… Unknown     
#>  2 1040 KREIS …     2 1040 KREI… NA         {D05… 2025-03-30  -6.7… Unknown     
#>  3 9960 DALY R…     3 9960 DALY… NA         {109… 2025-03-30  -6.1… Unknown     
#>  4 413 VOLKERT…     4 413 VOLKE… NA         {1BB… 2025-03-30  -6.7… Unknown     
#>  5 8519 LINDER…     5 8519 LIND… NA         {F78… 2025-03-30  -6.6… Unknown     
#>  6 6361 BEECHM…     6 6361 BEEC… NA         {6D4… 2025-03-30  -6.6… Unknown     
#>  7 10466 ADVEN…     7 10466 ADV… NA         {1D1… 2025-03-30  -6.1… Unknown     
#>  8 3156 LOOKOU…     8 3156 LOOK… NA         {AE8… 2025-03-30  -6.6… Unknown     
#>  9 310 WYOMING…     9 310 WYOMI… NA         {331… 2025-03-30  -6.1… Unknown     
#> 10 118 SPRINGF…    10 118 SPRIN… NA         {F3E… 2025-03-30  -6.1… Unknown     
#> # ℹ 101 more rows
#> # ℹ 1 more variable: parcel_id <chr>

# some addresses may match with more than one address in NAD
# since matching does not consider subaddress (e.g. "line two")
# take the first row in these cases

table(addr_match_stage(d$nad_addr.y[!duplicated(d$id)]))
#> 
#>   none    zip street number 
#>     11      0      0     89