taf() uses the arrow package to open the hive-partitioned parquet dataset
of TIGER address features in the addr user data directory.
Arrow FileSystemDataset objects are database-like backends for
larger-than-memory datasets and support dplyr syntax for data manipulation;
see https://arrow.apache.org/docs/r/articles/data_wrangling.html.
taf_install() downloads and links TIGER address features and
feature names for a specific year and county, installing the resulting
file in the addr user data directory.
If an address feature does not have a corresponding LINEARID with a
feature name, then the street tags are parsed from the full name, in
which case the column, street_tag_parsed will be TRUE.
Usage
taf(year = as.character(2025:2011), version = "v1")
taf_install(
county,
year = as.character(2025:2011),
version = "v1",
overwrite = FALSE,
redownload = FALSE
)Arguments
- year
integer, length one; vintage of TIGER addrfeat (address feature) files
- version
character, length one; major version of the package and taf dataset schema
- county
character, length 1; county FIPS code
- overwrite
logical, length 1; overwrite an existing county install?
- redownload
logical, length 1; re-download cached TIGER ZIP files?
Value
a Dataset R6 object (see ?arrow::open_dataset); use dplyr
verbs to query the data and get results, see examples
Examples
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
#> Warning: structured street name info not found for 1216 ranges
taf()
#> FileSystemDataset with 63 Parquet files
#> 19 columns
#> LINEARID: string
#> FULLNAME: string
#> side: string
#> ZIP: string
#> FROMHN: int32
#> TOHN: int32
#> PARITY: string
#> OFFSET: string
#> street_predirectional: string
#> street_premodifier: string
#> street_pretype: string
#> street_name: string
#> street_posttype: string
#> street_postdirectional: string
#> county_fips: string
#> street_tag_parsed: bool
#> geometry_wkt: string
#> zip3: string
#> zip2: string
# use dplyr verbs to query
library(dplyr, warn.conflicts = FALSE)
# find top ten most frequent street name-posttype combinations
taf() |>
group_by(street_name, street_posttype) |>
summarize(
n_zips = n_distinct(ZIP),
n_ranges = n(),
.groups = "drop"
) |>
arrange(desc(n_zips), desc(n_ranges)) |>
collect() |>
slice(1:10)
#> # A tibble: 10 × 4
#> street_name street_posttype n_zips n_ranges
#> <chr> <chr> <int> <int>
#> 1 Washington Ave 11 77
#> 2 Oak St 11 64
#> 3 Harrison Ave 10 313
#> 4 Galbraith Rd 9 242
#> 5 Main St 9 152
#> 6 Highland Ave 9 111
#> 7 Jefferson Ave 9 91
#> 8 Park Ave 9 83
#> 9 Ohio Ave 9 72
#> 10 Kemper Rd 8 231
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
#> Warning: structured street name info not found for 1216 ranges