Skip to contents

addr_left_join() is a convenience wrapper around addr_match() that returns a left-join style result. It expands rows of x for duplicate rows in the original y that share the exact matched addr, but it does not return multiple distinct candidate addresses from y. addr_match() still selects a single best address before this wrapper expands exact duplicates.

Usage

addr_left_join(
  x,
  y,
  by = "addr",
  suffix = c(".x", ".y"),
  zip_variants = TRUE,
  name_phonetic_dist = 2L,
  name_fuzzy_dist = 1L,
  number_fuzzy_dist = 1L,
  match_street_predirectional = TRUE,
  match_street_posttype = TRUE,
  match_street_pretype = TRUE,
  match_street_postdirectional = FALSE,
  progress = interactive(),
  match_prepared = NULL
)

Arguments

x, y

data frames or tibbles with an addr column

by

addr column name in x (and y if the same); or a length-2 character vector of c(x_col, y_col)

suffix

character vector of length 2 used to suffix duplicate columns

zip_variants

logical; fuzzy match to common variants of x in y? (e.g., changing 4th or 5th digit)

name_phonetic_dist

integer; maximum optimized string alignment distance between phonetic_street_key() of x and y to consider a possible match

name_fuzzy_dist

integer; maximum optimized string alignment distance between @name of x and y to consider a possible match

number_fuzzy_dist

integer; maximum optimized string alignment distance between @number of x and y to consider a possible match

match_street_predirectional

logical; require street predirectional to match when selecting street candidates?

match_street_posttype

logical; require street posttype to match when selecting street candidates?

match_street_pretype

logical; require street pretype to match when selecting street candidates?

match_street_postdirectional

logical; require street postdirectional to match when selecting street candidates?

progress

logical; show addr_match() progress?

match_prepared

optional prepared addr_match_index for the y addr column, usually from addr_match_prepare(). When supplied, addr_left_join() validates that it is equivalent to the y addr column before reusing it for matching.

Value

A data frame with left-join semantics. Duplicate rows in y with the exact same matched addr are all returned. Partial ZIP-only or street-only matches do not expand to multiple candidate rows in y.

Examples

the_addr <- nad("Hamilton", "OH", refresh_source = "no", refresh_binary = "no")
my_addr <- tibble::tibble(
  addr = as_addr(voter_addresses()[1:100]),
  id = 1:100
)

d <- addr_left_join(
  my_addr,
  the_addr,
  by = c("addr", "nad_addr"),
  match_prepared = nad_example_data(match_prepared = TRUE)
)

d
#> # A tibble: 111 × 9
#>    addr            id nad_addr.y subaddress uuid  date_update s2    address_type
#>    <addr>       <int> <addr>     <chr>      <chr> <date>      <s2_> <chr>       
#>  1 3359 QUEEN …     1 3359 QUEE… NA         {E3A… 2025-03-30  -6.7… Unknown     
#>  2 1040 KREIS …     2 1040 KREI… NA         {D05… 2025-03-30  -6.7… Unknown     
#>  3 9960 DALY R…     3 9960 DALY… NA         {109… 2025-03-30  -6.1… Unknown     
#>  4 413 VOLKERT…     4 413 VOLKE… NA         {1BB… 2025-03-30  -6.7… Unknown     
#>  5 8519 LINDER…     5 8519 LIND… NA         {F78… 2025-03-30  -6.6… Unknown     
#>  6 6361 BEECHM…     6 6361 BEEC… NA         {6D4… 2025-03-30  -6.6… Unknown     
#>  7 10466 ADVEN…     7 10466 ADV… NA         {1D1… 2025-03-30  -6.1… Unknown     
#>  8 3156 LOOKOU…     8 3156 LOOK… NA         {AE8… 2025-03-30  -6.6… Unknown     
#>  9 310 WYOMING…     9 310 WYOMI… NA         {331… 2025-03-30  -6.1… Unknown     
#> 10 118 SPRINGF…    10 118 SPRIN… NA         {F3E… 2025-03-30  -6.1… Unknown     
#> # ℹ 101 more rows
#> # ℹ 1 more variable: parcel_id <chr>

# some addresses may match with more than one address in NAD
# since matching does not consider subaddress (e.g. "line two")
# take the first row in these cases

table(addr_match_stage(d$nad_addr.y[!duplicated(d$id)]))
#> 
#>   none    zip street number 
#>     11      0      0     89