addr_left_join() is a convenience wrapper around addr_match() that
returns a left-join style result. It expands rows of x for duplicate rows
in the original y that share the exact matched addr, but it does not
return multiple distinct candidate addresses from y. addr_match() still
selects a single best address before this wrapper expands exact duplicates.
Usage
addr_left_join(
x,
y,
by = "addr",
suffix = c(".x", ".y"),
zip_variants = TRUE,
name_phonetic_dist = 2L,
name_fuzzy_dist = 1L,
number_fuzzy_dist = 1L,
match_street_predirectional = TRUE,
match_street_posttype = TRUE,
match_street_pretype = TRUE,
match_street_postdirectional = FALSE,
progress = interactive(),
match_prepared = NULL
)Arguments
- x, y
data frames or tibbles with an addr column
- by
addr column name in
x(andyif the same); or a length-2 character vector ofc(x_col, y_col)- suffix
character vector of length 2 used to suffix duplicate columns
- zip_variants
logical; fuzzy match to common variants of x in y? (e.g., changing 4th or 5th digit)
- name_phonetic_dist
integer; maximum optimized string alignment distance between
phonetic_street_key()of x and y to consider a possible match- name_fuzzy_dist
integer; maximum optimized string alignment distance between
@nameof x and y to consider a possible match- number_fuzzy_dist
integer; maximum optimized string alignment distance between
@numberof x and y to consider a possible match- match_street_predirectional
logical; require street predirectional to match when selecting street candidates?
- match_street_posttype
logical; require street posttype to match when selecting street candidates?
- match_street_pretype
logical; require street pretype to match when selecting street candidates?
- match_street_postdirectional
logical; require street postdirectional to match when selecting street candidates?
- progress
logical; show
addr_match()progress?- match_prepared
optional prepared
addr_match_indexfor theyaddr column, usually fromaddr_match_prepare(). When supplied,addr_left_join()validates that it is equivalent to theyaddr column before reusing it for matching.
Value
A data frame with left-join semantics. Duplicate rows in y with
the exact same matched addr are all returned. Partial ZIP-only or
street-only matches do not expand to multiple candidate rows in y.
Examples
the_addr <- nad("Hamilton", "OH", refresh_source = "no", refresh_binary = "no")
my_addr <- tibble::tibble(
addr = as_addr(voter_addresses()[1:100]),
id = 1:100
)
d <- addr_left_join(
my_addr,
the_addr,
by = c("addr", "nad_addr"),
match_prepared = nad_example_data(match_prepared = TRUE)
)
d
#> # A tibble: 111 × 9
#> addr id nad_addr.y subaddress uuid date_update s2 address_type
#> <addr> <int> <addr> <chr> <chr> <date> <s2_> <chr>
#> 1 3359 QUEEN … 1 3359 QUEE… NA {E3A… 2025-03-30 -6.7… Unknown
#> 2 1040 KREIS … 2 1040 KREI… NA {D05… 2025-03-30 -6.7… Unknown
#> 3 9960 DALY R… 3 9960 DALY… NA {109… 2025-03-30 -6.1… Unknown
#> 4 413 VOLKERT… 4 413 VOLKE… NA {1BB… 2025-03-30 -6.7… Unknown
#> 5 8519 LINDER… 5 8519 LIND… NA {F78… 2025-03-30 -6.6… Unknown
#> 6 6361 BEECHM… 6 6361 BEEC… NA {6D4… 2025-03-30 -6.6… Unknown
#> 7 10466 ADVEN… 7 10466 ADV… NA {1D1… 2025-03-30 -6.1… Unknown
#> 8 3156 LOOKOU… 8 3156 LOOK… NA {AE8… 2025-03-30 -6.6… Unknown
#> 9 310 WYOMING… 9 310 WYOMI… NA {331… 2025-03-30 -6.1… Unknown
#> 10 118 SPRINGF… 10 118 SPRIN… NA {F3E… 2025-03-30 -6.1… Unknown
#> # ℹ 101 more rows
#> # ℹ 1 more variable: parcel_id <chr>
# some addresses may match with more than one address in NAD
# since matching does not consider subaddress (e.g. "line two")
# take the first row in these cases
table(addr_match_stage(d$nad_addr.y[!duplicated(d$id)]))
#>
#> none zip street number
#> 11 0 0 89