Fuzzy match addr vectors using field-specific string distances
Source:R/fuzzy_match.R
addr_fuzzy_match.Rdaddr_fuzzy_match() matches two addr vectors using more than one address tag
fuzzy_match_addr_field() matches two addr vectors using a single address
tag
Distances between address tags are defined using optimized string alignment;
see fuzzy_match() and stringdist::stringdist() for more details.
Usage
addr_fuzzy_match(x, y, addr_fields = NULL)
fuzzy_match_addr_field(
x,
y,
addr_field = c("number_prefix", "number_digits", "number_suffix",
"street_predirectional", "street_premodifier", "street_pretype", "street_name",
"street_posttype", "street_postdirectional", "place_name", "place_state",
"place_zipcode"),
osa_max_dist = 0
)Arguments
- x
addr vector to match
- y
addr vector to match to
- addr_fields
a named vector of osa_max_distances; if max distances for each addr tag field is not present a default will be used (see details).
- addr_field
character name of single addr field to match on
- osa_max_dist
maximum optimized string alignment distance used as threshold for matching on single addr field
Value
a list of integer vectors representing the position of the best
matching address(es) in y for each address in x
Details
Defaults for addr_fields:
number_prefix: 0
number_digits: 0
number_suffix: 0
street_predirectional: 0
street_premodifier: 0
street_pretype: 0
street_name: 1
street_posttype: 0
street_postdirectional: 0
place_name: 0
place_state: 0
place_zipcode: 0
When fuzzy matching street_name, the "phonetic_street_key"
prefilter is automatically used (see ?fuzzy_match).
Examples
x_addr <- as_addr(c("123 Main St.", "333 Burnet Ave", "3333 Foofy Ave"))
y_addr <- as_addr(c("0000 Main Street", "3333 Burnet Avenue", "222 Burnet Ave"))
# no matches with defaults
addr_fuzzy_match(x_addr, y_addr)
#> [[1]]
#> integer(0)
#>
#> [[2]]
#> integer(0)
#>
#> [[3]]
#> integer(0)
#>
# match on osa_max_dist of 2 for the address number
addr_fuzzy_match(x_addr, y_addr, addr_fields = c("number_digits" = 2))
#> [[1]]
#> integer(0)
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> integer(0)
#>
# ignore address number when matching
addr_fuzzy_match(x_addr, y_addr, addr_fields = c("number_digits" = Inf))
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
#>
#> [[3]]
#> integer(0)
#>
fuzzy_match_addr_field(
as_addr(c("123 Main St.", "3333 Burnet Ave", "3333 Foofy Ave")),
as_addr(c("0000 Main Street", "0000 Burnet Avenue", "222 Burnet Ave")),
addr_field = "street_name", osa_max_dist = 1
)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2 3
#>
#> [[3]]
#> integer(0)
#>
# empty address fields have an OSA distance of zero and always match
fuzzy_match_addr_field(
as_addr(c("123 Main St.", "3333 Burnet Ave", "3333 Foofy Ave")),
as_addr(c("0000 Main Street", "0000 Burnet Avenue", "222 Burnet Ave")),
addr_field = "number_prefix"
)
#> [[1]]
#> [1] 1 2 3
#>
#> [[2]]
#> [1] 1 2 3
#>
#> [[3]]
#> [1] 1 2 3
#>