This is a convenience wrapper around the addr fuzzy matching helpers that returns a left-join style result. The addr columns are matched by index and rows are expanded for one-to-many or many-to-many matches.
Usage
addr_fuzzy_left_join(
x,
y,
by = "addr",
addr_fields = NULL,
suffix = c(".x", ".y")
)Arguments
- x, y
data frames or tibbles with an addr column
- by
addr column name in
x(andyif the same); or a length-2 character vector ofc(x_col, y_col)- addr_fields
a named vector of osa_max_distances; if max distances for each addr tag field is not present a default will be used (see details).
- suffix
character vector of length 2 used to suffix duplicate columns
Details
addr_fuzzy_left_join works by matching addresses grouped by ZIP codes, so specified
osa_max_distances for any place fields are ignored
Defaults for addr_fields:
number_prefix: 0
number_digits: 0
number_suffix: 0
street_predirectional: 0
street_premodifier: 0
street_pretype: 0
street_name: 1
street_posttype: 0
street_postdirectional: 0
Examples
my_addr <-
tibble::tibble(address = voter_addresses()[1:1000],
addr = as_addr(address),
id = sprintf("id_%04d", seq_len(1000)))
#> Warning: street name post type not mapped: la
the_addr <- nad_example_data()
addr_fuzzy_left_join(my_addr, the_addr, c("addr", "nad_addr"))
#> 49 of 49 unique ZIP codes in x matched to one of 60 unique ZIP codes in y
#> matching by zipcode ■■ 2% | ETA: 3m
#> matching by zipcode ■■■ 6% | ETA: 2m
#> matching by zipcode ■■■■ 10% | ETA: 1m
#> matching by zipcode ■■■■■ 14% | ETA: 1m
#> matching by zipcode ■■■■■■■ 20% | ETA: 1m
#> matching by zipcode ■■■■■■■■■ 27% | ETA: 48s
#> matching by zipcode ■■■■■■■■■■■ 33% | ETA: 45s
#> matching by zipcode ■■■■■■■■■■■■ 37% | ETA: 41s
#> matching by zipcode ■■■■■■■■■■■■■■ 43% | ETA: 35s
#> matching by zipcode ■■■■■■■■■■■■■■■■ 49% | ETA: 31s
#> matching by zipcode ■■■■■■■■■■■■■■■■■■ 57% | ETA: 24s
#> matching by zipcode ■■■■■■■■■■■■■■■■■■■■■ 65% | ETA: 19s
#> matching by zipcode ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 15s
#> matching by zipcode ■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 9s
#> matching by zipcode ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 92% | ETA: 4s
#> Warning: Multi-matches detected for 23 addr in x;
#> More than one row of y will be returned once for each matching row in x
#> # A tibble: 1,796 × 10
#> address addr id nad_addr.y subaddress uuid date_update s2
#> <chr> <add> <chr> <addr> <chr> <chr> <date> <s2_>
#> 1 3359 QUEEN CITY AV… 3359… id_0… 3359 QUEE… NA {E3A… 2025-03-30 -6.7…
#> 2 5179 RIVERWATCH DR… 5179… id_0… 5179 RIVE… NA {CDE… 2025-03-30 -6.7…
#> 3 4724 GLENWAY AVE C… 4724… id_0… 4724 GLEN… NA {EB2… 2025-03-30 -6.7…
#> 4 2375 FAIRGREEN DR … 2375… id_0… 2375 FAIR… NA {204… 2025-03-30 -6.7…
#> 5 259 CALVERTON LN C… 259 … id_0… NA NA NA …
#> 6 1034 FASHION AVE C… 1034… id_0… 1034 FASH… NA {DDD… 2025-03-30 -6.7…
#> 7 2720 QUEEN CITY AV… 2720… id_0… 2720 QUEE… NA {D94… 2025-03-30 -6.7…
#> 8 4450 CLOVERHILL TE… 4450… id_0… 4450 CLOV… NA {55F… 2025-03-30 -6.7…
#> 9 175 ANDERSON FERRY… 175 … id_0… 175 ANDER… NA {715… 2025-03-30 -6.7…
#> 10 5270 WILLNET DR CI… 5270… id_0… 5270 WILL… NA {A79… 2025-03-30 -6.7…
#> # ℹ 1,786 more rows
#> # ℹ 2 more variables: address_type <chr>, parcel_id <chr>