Optimized String Alignment (OSA) distances are used to
choose a set of matching reference addr with flexible, field-specific thresholds.
See fuzzy_match()
/fuzzy_match_addr_field()
for more details.
addr vectors are matched in groups by five digit ZIP codes.
Usage
addr_match(
x,
ref_addr,
max_dist_street_number = 0,
max_dist_street_name = 1,
max_dist_street_type = 0,
simplify = FALSE
)
addr_match_line_one(
x,
ref_addr,
max_dist_street_number = 0,
max_dist_street_name = 1,
max_dist_street_type = 0,
simplify = FALSE
)
Arguments
- x
an addr vector to match
- ref_addr
an addr vector to search for matches in
- max_dist_street_number
maximum OSA distance to consider a match for the addr street_number; set to NULL to disregard street number
- max_dist_street_name
maximum OSA distance to consider a match for the addr street_name
- max_dist_street_type
maximum OSA distance to consider a match for the addr street_type
- simplify
logical; randomly select one addr from multi-matches and return an addr() vector instead of a list? (empty addr vectors and NULL values are converted to NA)
Value
a named list of possible addr matches for each addr in x
;
a list value of NULL means the zip code was not matched and
a list value of a zero-length addr vector means the zip code was matched,
but the street number, name, and type were not matched
Examples
addr(c("3333 Burnet Ave Cincinnati OH 45229", "5130 RAPID RUN RD CINCINNATI OHIO 45238")) |>
addr_match(cagis_addr()$cagis_addr)
#> $`3333 Burnet Avenue Cincinnati OH 45229`
#> 3333 Burnet Avenue Cincinnati OH 45229
#>
#> $`5130 Rapid Run Road Cincinnati OHIO 45238`
#> 5130 Rapid Run Road Delhi Township OH 45238
#>
addr(c("3333 Burnet Ave Cincinnati OH 45229", "5130 RAPID RUN RD CINCINNATI OHIO 45238")) |>
addr_match(cagis_addr()$cagis_addr, simplify = FALSE) |>
tibble::enframe(name = "input_addr", value = "ca") |>
dplyr::mutate(ca = purrr::list_c(ca)) |>
dplyr::left_join(cagis_addr(), by = c("ca" = "cagis_addr")) |>
tidyr::unnest(cols = c(cagis_addr_data)) |>
dplyr::select(-ca, -cagis_address)
#> # A tibble: 2 × 6
#> input_addr cagis_address_place cagis_address_type cagis_s2 cagis_parcel_id
#> <chr> <chr> <chr> <s2_cel> <chr>
#> 1 3333 Burnet A… NA BLD -6.7013… 010400020052
#> 2 5130 Rapid Ru… NA BLD -6.7345… 054000510478
#> # ℹ 1 more variable: cagis_is_condo <lgl>
addr_match_line_one(addr(c("3333 Burnet Ave", "3333 Foofy Ave")),
addr(c("Main Street", "Burnet Avenue")),
max_dist_street_number = NULL)
#> $`3333 Burnet Avenue`
#> NA Burnet Avenue NA NA NA
#>
#> $`3333 Foofy Avenue`
#>
#>