matching addr vectors — addr

Optimized String Alignment (OSA) distances are used to choose a set of matching reference addr with flexible, field-specific thresholds. See fuzzy_match()/fuzzy_match_addr_field() for more details. addr vectors are matched in groups by five digit ZIP codes.

Usage

addr_match(
  x,
  ref_addr,
  max_dist_street_number = 0,
  max_dist_street_name = 1,
  max_dist_street_type = 0,
  simplify = FALSE
)

addr_match_line_one(
  x,
  ref_addr,
  max_dist_street_number = 0,
  max_dist_street_name = 1,
  max_dist_street_type = 0,
  simplify = FALSE
)

Arguments

x: an addr vector to match
ref_addr: an addr vector to search for matches in
max_dist_street_number: maximum OSA distance to consider a match for the addr street_number; set to NULL to disregard street number
max_dist_street_name: maximum OSA distance to consider a match for the addr street_name
max_dist_street_type: maximum OSA distance to consider a match for the addr street_type
simplify: logical; randomly select one addr from multi-matches and return an addr() vector instead of a list? (empty addr vectors and NULL values are converted to NA)

Value

a named list of possible addr matches for each addr in x; a list value of NULL means the zip code was not matched and a list value of a zero-length addr vector means the zip code was matched, but the street number, name, and type were not matched

Examples

addr(c("3333 Burnet Ave Cincinnati OH 45229", "5130 RAPID RUN RD CINCINNATI OHIO 45238")) |>
  addr_match(cagis_addr()$cagis_addr)
#> $`3333 Burnet Avenue Cincinnati OH 45229`
#> 3333 Burnet Avenue Cincinnati OH 45229 
#> 
#> $`5130 Rapid Run Road Cincinnati OHIO 45238`
#> 5130 Rapid Run Road Delhi Township OH 45238 
#> 

addr(c("3333 Burnet Ave Cincinnati OH 45229", "5130 RAPID RUN RD CINCINNATI OHIO 45238")) |>
  addr_match(cagis_addr()$cagis_addr, simplify = FALSE) |>
  tibble::enframe(name = "input_addr", value = "ca") |>
  dplyr::mutate(ca = purrr::list_c(ca)) |>
  dplyr::left_join(cagis_addr(), by = c("ca" = "cagis_addr")) |>
  tidyr::unnest(cols = c(cagis_addr_data)) |>
  dplyr::select(-ca, -cagis_address)
#> # A tibble: 2 × 6
#>   input_addr     cagis_address_place cagis_address_type cagis_s2 cagis_parcel_id
#>   <chr>          <chr>               <chr>              <s2_cel> <chr>          
#> 1 3333 Burnet A… NA                  BLD                -6.7013… 010400020052   
#> 2 5130 Rapid Ru… NA                  BLD                -6.7345… 054000510478   
#> # ℹ 1 more variable: cagis_is_condo <lgl>
addr_match_line_one(addr(c("3333 Burnet Ave", "3333 Foofy Ave")),
                    addr(c("Main Street", "Burnet Avenue")),
                    max_dist_street_number = NULL)
#> $`3333 Burnet Avenue`
#> NA Burnet Avenue NA NA NA 
#> 
#> $`3333 Foofy Avenue`
#>  
#>