Left join two data frames using fuzzy addr matching
Source:R/addr_fuzzy_left_join.R
addr_fuzzy_left_join.RdThis wraps around the addr fuzzy matching helpers and returns a left-join style result. The addr columns are matched by index and rows are expanded for one-to-many or many-to-many matches.
See addr_match() and addr_left_join() for an alternative strategy
to matching addr and joining data frames with addr columns
that is faster, but is limited to one best match instead of
multi-matches.
Usage
addr_fuzzy_left_join(
x,
y,
by = "addr",
addr_fields = NULL,
suffix = c(".x", ".y"),
progress = interactive()
)Arguments
- x, y
data frames or tibbles with an addr column
- by
addr column name in
x(andyif the same); or a length-2 character vector ofc(x_col, y_col)- addr_fields
a named vector of osa_max_distances; if max distances for each addr tag field is not present a default will be used (see details).
- suffix
character vector of length 2 used to suffix duplicate columns
- progress
logical; show progress bar while processing matched ZIP groups?
Details
addr_fuzzy_left_join works by matching addresses grouped by ZIP codes, so specified
osa_max_distances for any place fields are ignored
Defaults for addr_fields:
number_prefix: 0
number_digits: 0
number_suffix: 0
street_predirectional: 0
street_premodifier: 0
street_pretype: 0
street_name: 1
street_posttype: 0
street_postdirectional: 0
Examples
my_addr <-
tibble::tibble(address = voter_addresses()[1:10],
addr = as_addr(address),
id = sprintf("id_%04d", seq_len(10)))
the_addr <- nad_example_data()
addr_fuzzy_left_join(my_addr, the_addr, c("addr", "nad_addr"))
#>
matching addr vectors
#> [..............................................................................]
matching addr vectors in 45238 (1 to 16,422)
#> [..............................................................................]
matching addr vectors in 45238 (1 to 16,422)
#> [=======.......................................................................]
matching addr vectors in 45205 (1 to 7,468)
#> [=======.......................................................................]
matching addr vectors in 45205 (1 to 7,468)
#> [===============...............................................................]
matching addr vectors in 45231 (1 to 17,147)
#> [===============...............................................................]
matching addr vectors in 45231 (1 to 17,147)
#> [=======================.......................................................]
matching addr vectors in 45219 (1 to 4,932)
#> [=======================.......................................................]
matching addr vectors in 45219 (1 to 4,932)
#> [===============================...............................................]
matching addr vectors in 45255 (1 to 6,232)
#> [===============================...............................................]
matching addr vectors in 45255 (1 to 6,232)
#> [=======================================.......................................]
matching addr vectors in 45230 (1 to 11,520)
#> [=======================================.......................................]
matching addr vectors in 45230 (1 to 11,520)
#> [==============================================................................]
matching addr vectors in 45242 (1 to 10,590)
#> [==============================================................................]
matching addr vectors in 45242 (1 to 10,590)
#> [======================================================........................]
matching addr vectors in 45208 (1 to 8,025)
#> [======================================================........................]
matching addr vectors in 45208 (1 to 8,025)
#> [==============================================================................]
matching addr vectors in 45215 (2 to 12,717)
#> [==============================================================................]
matching addr vectors in 45215 (2 to 12,717)
#> [==============================================================================]
matching addr vectors complete
#> [==============================================================================]
#> # A tibble: 10 × 10
#> address addr id nad_addr.y subaddress uuid date_update s2
#> <chr> <add> <chr> <addr> <chr> <chr> <date> <s2_>
#> 1 3359 QUEEN CITY AV… 3359… id_0… 3359 QUEE… NA {E3A… 2025-03-30 -6.7…
#> 2 1040 KREIS LN CINC… 1040… id_0… 1040 KREI… NA {D05… 2025-03-30 -6.7…
#> 3 9960 DALY RD CINCI… 9960… id_0… 9960 DALY… NA {109… 2025-03-30 -6.1…
#> 4 413 VOLKERT PL CIN… 413 … id_0… 413 VOLKE… NA {1BB… 2025-03-30 -6.7…
#> 5 8519 LINDERWOOD LN… 8519… id_0… 8519 LIND… NA {F78… 2025-03-30 -6.6…
#> 6 6361 BEECHMONT AVE… 6361… id_0… 6361 BEEC… NA {6D4… 2025-03-30 -6.6…
#> 7 10466 ADVENTURE LN… 1046… id_0… 10466 ADV… NA {1D1… 2025-03-30 -6.1…
#> 8 3156 LOOKOUT CIR C… 3156… id_0… 3156 LOOK… NA {AE8… 2025-03-30 -6.6…
#> 9 310 WYOMING AVE CI… 310 … id_0… 310 WYOMI… NA {331… 2025-03-30 -6.1…
#> 10 118 SPRINGFIELD PI… 118 … id_0… 118 SPRIN… NA {F3E… 2025-03-30 -6.1…
#> # ℹ 2 more variables: address_type <chr>, parcel_id <chr>