taf() uses the arrow package to open the hive-partitioned parquet dataset
of TIGER address features in the addr user data directory.
Arrow FileSystemDataset objects are database-like backends for
larger-than-memory datasets and support dplyr syntax for data manipulation;
see https://arrow.apache.org/docs/r/articles/data_wrangling.html.
Other TAF helpers such as taf_catalog(), taf_install(), and taf_zip()
use nanoparquet directly for flat parquet file reads and writes. Arrow is
only required for the advanced dataset interface returned by taf().
Usage
taf(year = as.character(2025:2011), version = "v1")
taf_install(
county,
year = as.character(2025:2011),
version = "v1",
overwrite = FALSE,
redownload = FALSE
)Arguments
- year
integer, length one; vintage of TIGER addrfeat (address feature) files
- version
character, length one; major version of the package and taf dataset schema
- county
character, length 1; county FIPS code
- overwrite
logical, length 1; overwrite an existing county install?
- redownload
logical, length 1; re-download cached TIGER ZIP files?
Value
a Dataset R6 object (see ?arrow::open_dataset); use dplyr
verbs to query the data and get results, see examples
Details
taf_install() downloads and links TIGER address features and
feature names for a specific year and county, installing the resulting
file in the addr user data directory.
About 6% of ADDRFEAT rows do not have a county-local primary FEATNAMES
match by LINEARID. In these cases, street tags are parsed from the
ADDRFEAT full name, and the street_tag_parsed column is set to TRUE.
Examples
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
taf()
#> FileSystemDataset with 63 Parquet files
#> 19 columns
#> LINEARID: string not null
#> FULLNAME: string not null
#> side: string not null
#> ZIP: string not null
#> FROMHN: int32 not null
#> TOHN: int32 not null
#> PARITY: string not null
#> OFFSET: string not null
#> street_predirectional: string not null
#> street_premodifier: string not null
#> street_pretype: string not null
#> street_name: string not null
#> street_posttype: string not null
#> street_postdirectional: string not null
#> county_fips: string not null
#> street_tag_parsed: bool not null
#> geometry_wkt: string not null
#> zip3: string
#> zip2: string
# use dplyr verbs to query
library(dplyr, warn.conflicts = FALSE)
# find top ten most frequent street name-posttype combinations
taf() |>
group_by(street_name, street_posttype) |>
summarize(
n_zips = n_distinct(ZIP),
n_ranges = n(),
.groups = "drop"
) |>
arrange(desc(n_zips), desc(n_ranges)) |>
collect() |>
slice(1:10)
#> # A tibble: 10 × 4
#> street_name street_posttype n_zips n_ranges
#> <chr> <chr> <int> <int>
#> 1 Washington Ave 11 77
#> 2 Oak St 11 64
#> 3 Harrison Ave 10 313
#> 4 Galbraith Rd 9 242
#> 5 Main St 9 152
#> 6 Highland Ave 9 111
#> 7 Jefferson Ave 9 91
#> 8 Park Ave 9 83
#> 9 Ohio Ave 9 72
#> 10 Kemper Rd 8 231
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")