addr_left_join() is a convenience wrapper around addr_match() that
returns a left-join style result. It expands rows of x for duplicate rows
in the original y that share the exact matched addr, but it does not
return multiple distinct candidate addresses from y. addr_match() still
selects a single best address before this wrapper expands exact duplicates.
Usage
addr_left_join(
x,
y,
by = "addr",
suffix = c(".x", ".y"),
zip_variants = TRUE,
zip_variant = c("minus1", "plus1", "sub5", "sub4", "swap"),
name_phonetic_dist = 2L,
name_fuzzy_dist = 1L,
number_fuzzy_dist = 1L,
match_street_type = c("exact", "compatible", "ignore"),
match_street_directional = c("exact", "swap", "ignore"),
progress = interactive(),
match_prepared = NULL
)Arguments
- x, y
data frames or tibbles with an addr column
- by
addr column name in
x(andyif the same); or a length-2 character vector ofc(x_col, y_col)- suffix
character vector of length 2 used to suffix duplicate columns
- zip_variants
logical; fuzzy match to common variants of
xiny?- zip_variant
character vector; zipcode variant types to use when
zip_variantsisTRUE; see?zipcode_variant- name_phonetic_dist
integer; maximum optimized string alignment distance between
phonetic_street_key()of x and y to consider a possible match- name_fuzzy_dist
integer; maximum optimized string alignment distance between
@nameof x and y to consider a possible match- number_fuzzy_dist
integer; maximum optimized string alignment distance between
addr_numberstrings inxandyto consider a possible match.- match_street_type
character; how to compare street pretype and posttype when selecting street candidates.
"exact"requires pretype to match pretype and posttype to match posttype;"compatible"treats blank type fields as unknown but rejects candidates when known type information conflicts;"ignore"does not use street type fields when selecting candidates.- match_street_directional
character; how to compare street predirectional and postdirectional when selecting street candidates.
"exact"requires predirectional to match predirectional and postdirectional to match postdirectional;"swap"also permits predirectional to match postdirectional and postdirectional to match predirectional;"ignore"does not use street directional fields when selecting candidates.- progress
logical; show
addr_match()progress?- match_prepared
optional prepared
addr_match_indexfor theyaddr column, usually fromaddr_match_prepare(). When supplied,addr_left_join()validates that it is equivalent to theyaddr column before reusing it for matching.
Value
A data frame with left-join semantics. Duplicate rows in y with
the exact same matched addr are all returned. Partial ZIP-only or
street-only matches do not expand to multiple candidate rows in y.
Examples
the_addr <- nad("Hamilton", "OH",
refresh_binary = "no", refresh_source = "no")
my_addr <- tibble::tibble(
addr = as_addr(voter_addresses()[1:100]),
id = 1:100
)
d <- addr_left_join(
my_addr,
the_addr,
by = c("addr", "nad_addr"),
match_prepared = nad_example_data(match_prepared = TRUE)
)
d
#> # A tibble: 111 × 9
#> addr id nad_addr.y subaddress uuid date_update s2 address_type
#> <addr> <int> <addr> <chr> <chr> <date> <s2_> <chr>
#> 1 3359 QUEEN … 1 3359 QUEE… NA {E3A… 2025-03-30 -6.7… Unknown
#> 2 1040 KREIS … 2 1040 KREI… NA {D05… 2025-03-30 -6.7… Unknown
#> 3 9960 DALY R… 3 9960 DALY… NA {109… 2025-03-30 -6.1… Unknown
#> 4 413 VOLKERT… 4 413 VOLKE… NA {1BB… 2025-03-30 -6.7… Unknown
#> 5 8519 LINDER… 5 8519 LIND… NA {F78… 2025-03-30 -6.6… Unknown
#> 6 6361 BEECHM… 6 6361 BEEC… NA {6D4… 2025-03-30 -6.6… Unknown
#> 7 10466 ADVEN… 7 10466 ADV… NA {1D1… 2025-03-30 -6.1… Unknown
#> 8 3156 LOOKOU… 8 3156 LOOK… NA {AE8… 2025-03-30 -6.6… Unknown
#> 9 310 WYOMING… 9 310 WYOMI… NA {331… 2025-03-30 -6.1… Unknown
#> 10 118 SPRINGF… 10 118 SPRIN… NA {F3E… 2025-03-30 -6.1… Unknown
#> # ℹ 101 more rows
#> # ℹ 1 more variable: parcel_id <chr>
# some addresses may match with more than one address in NAD
# since matching does not consider subaddress (e.g. "line two")
# take the first row in these cases
table(addr_match_stage(d$nad_addr.y[!duplicated(d$id)]))
#>
#> none zip street number
#> 11 0 0 89