Left join two data frames using fuzzy addr matching

This wraps the addr fuzzy matching helpers and returns a left-join style result. The addr columns are matched by index and rows are expanded for one-to-many or many-to-many matches.

See addr_match() and addr_left_join() for a faster alternative that returns one selected match instead of all fuzzy matches.

Usage

addr_fuzzy_left_join(
  x,
  y,
  by = "addr",
  addr_fields = NULL,
  suffix = c(".x", ".y"),
  progress = interactive()
)

Arguments

x, y: data frames or tibbles with an addr column
by: addr column name in x (and y if the same); or a length-2 character vector of c(x_col, y_col)
addr_fields: a named vector of OSA maximum distances. Defaults are used for fields that are not supplied; see Details.
suffix: character vector of length 2 used to suffix duplicate columns
progress: logical; show progress bar while processing matched ZIP groups?

Value

a data frame with left-join semantics; note that row order will be changed compared to x

Details

addr_fuzzy_left_join() matches addresses within ZIP code groups, so maximum distances for place fields are ignored. Defaults for addr_fields:

number_prefix: 0
number_digits: 0
number_suffix: 0
street_predirectional: 0
street_premodifier: 0
street_pretype: 0
street_name: 1
street_posttype: 0
street_postdirectional: 0

Examples

my_addr <-
  tibble::tibble(address = voter_addresses()[1:10],
                 addr = as_addr(address),
                 id = sprintf("id_%04d", seq_len(10)))
the_addr <- nad_example_data()
addr_fuzzy_left_join(my_addr, the_addr, c("addr", "nad_addr"))
#> 
matching addr vectors
#> [..............................................................................]

matching addr vectors in 45238 (1 to 98)
#> [..............................................................................]

matching addr vectors in 45238 (1 to 98)
#> [=======.......................................................................]

matching addr vectors in 45205 (1 to 90)
#> [=======.......................................................................]

matching addr vectors in 45205 (1 to 90)
#> [===============...............................................................]

matching addr vectors in 45231 (1 to 92)
#> [===============...............................................................]

matching addr vectors in 45231 (1 to 92)
#> [=======================.......................................................]

matching addr vectors in 45219 (1 to 92)
#> [=======================.......................................................]

matching addr vectors in 45219 (1 to 92)
#> [===============================...............................................]

matching addr vectors in 45255 (1 to 88)
#> [===============================...............................................]

matching addr vectors in 45255 (1 to 88)
#> [=======================================.......................................]

matching addr vectors in 45230 (1 to 94)
#> [=======================================.......................................]

matching addr vectors in 45230 (1 to 94)
#> [==============================================................................]

matching addr vectors in 45242 (1 to 89)
#> [==============================================................................]

matching addr vectors in 45242 (1 to 89)
#> [======================================================........................]

matching addr vectors in 45208 (1 to 93)
#> [======================================================........................]

matching addr vectors in 45208 (1 to 93)
#> [==============================================================................]

matching addr vectors in 45215 (2 to 90)
#> [==============================================================................]

matching addr vectors in 45215 (2 to 90)
#> [==============================================================================]

matching addr vectors complete
#> [==============================================================================]
#> # A tibble: 10 × 10
#>    address             addr  id    nad_addr.y subaddress uuid  date_update s2   
#>    <chr>               <add> <chr> <addr>     <chr>      <chr> <date>      <s2_>
#>  1 3359 QUEEN CITY AV… 3359… id_0… 3359 QUEE… NA         {E3A… 2025-03-30  -6.7…
#>  2 1040 KREIS LN CINC… 1040… id_0… 1040 KREI… NA         {D05… 2025-03-30  -6.7…
#>  3 9960 DALY RD CINCI… 9960… id_0… 9960 DALY… NA         {109… 2025-03-30  -6.1…
#>  4 413 VOLKERT PL CIN… 413 … id_0… 413 VOLKE… NA         {1BB… 2025-03-30  -6.7…
#>  5 8519 LINDERWOOD LN… 8519… id_0… 8519 LIND… NA         {F78… 2025-03-30  -6.6…
#>  6 6361 BEECHMONT AVE… 6361… id_0… 6361 BEEC… NA         {6D4… 2025-03-30  -6.6…
#>  7 10466 ADVENTURE LN… 1046… id_0… 10466 ADV… NA         {1D1… 2025-03-30  -6.1…
#>  8 3156 LOOKOUT CIR C… 3156… id_0… 3156 LOOK… NA         {AE8… 2025-03-30  -6.6…
#>  9 310 WYOMING AVE CI… 310 … id_0… 310 WYOMI… NA         {331… 2025-03-30  -6.1…
#> 10 118 SPRINGFIELD PI… 118 … id_0… 118 SPRIN… NA         {F3E… 2025-03-30  -6.1…
#> # ℹ 2 more variables: address_type <chr>, parcel_id <chr>