TIGER Address Features dataset

taf() uses the arrow package to open the hive-partitioned parquet dataset of TIGER address features in the addr user data directory. Arrow FileSystemDataset objects are database-like backends for larger-than-memory datasets and support dplyr syntax for data manipulation; see https://arrow.apache.org/docs/r/articles/data_wrangling.html. Other TAF helpers such as taf_catalog(), taf_install(), and taf_zip() use nanoparquet directly for flat parquet file reads and writes. Arrow is only required for the advanced dataset interface returned by taf().

Usage

taf(year = as.character(2025:2011), version = "v1")

taf_install(
  county,
  year = as.character(2025:2011),
  version = "v1",
  overwrite = FALSE,
  redownload = FALSE
)

Arguments

year: integer, length one; vintage of TIGER addrfeat (address feature) files
version: character, length one; major version of the package and taf dataset schema
county: character, length 1; county FIPS code
overwrite: logical, length 1; overwrite an existing county install?
redownload: logical, length 1; re-download cached TIGER ZIP files?

Value

a Dataset R6 object (see ?arrow::open_dataset); use dplyr verbs to query the data and get results, see examples

Details

taf_install() downloads and links TIGER address features and feature names for a specific year and county, installing the resulting file in the addr user data directory. About 6% of ADDRFEAT rows do not have a county-local primary FEATNAMES match by LINEARID. In these cases, street tags are parsed from the ADDRFEAT full name, and the street_tag_parsed column is set to TRUE.

Examples

if (FALSE) { # \dontrun{
  Sys.setenv("R_USER_DATA_DIR" = tempfile())
  taf_install("39061", "2025")

  if (requireNamespace("arrow", quietly = TRUE) &&
    requireNamespace("dplyr", quietly = TRUE)) {
    taf()

    # find top ten most frequent street name-posttype combinations
    taf() |>
      dplyr::group_by(street_name, street_posttype) |>
      dplyr::summarize(
        n_zips = dplyr::n_distinct(ZIP),
        n_ranges = dplyr::n(),
        .groups = "drop"
      ) |>
      dplyr::arrange(dplyr::desc(n_zips), dplyr::desc(n_ranges)) |>
      dplyr::collect() |>
      dplyr::slice(1:10)
  }
} # }
if (FALSE) { # \dontrun{
  Sys.setenv("R_USER_DATA_DIR" = tempfile())
  taf_install("39061", "2025")
} # }