Skip to contents

taf() uses the arrow package to open the hive-partitioned parquet dataset of TIGER address features in the addr user data directory. Arrow FileSystemDataset objects are database-like backends for larger-than-memory datasets and support dplyr syntax for data manipulation; see https://arrow.apache.org/docs/r/articles/data_wrangling.html.

taf_install() downloads and links TIGER address features and feature names for a specific year and county, installing the resulting file in the addr user data directory. If an address feature does not have a corresponding LINEARID with a feature name, then the street tags are parsed from the full name, in which case the column, street_tag_parsed will be TRUE.

Usage

taf(year = as.character(2025:2011), version = "v1")

taf_install(
  county,
  year = as.character(2025:2011),
  version = "v1",
  overwrite = FALSE,
  redownload = FALSE
)

Arguments

year

integer, length one; vintage of TIGER addrfeat (address feature) files

version

character, length one; major version of the package and taf dataset schema

county

character, length 1; county FIPS code

overwrite

logical, length 1; overwrite an existing county install?

redownload

logical, length 1; re-download cached TIGER ZIP files?

Value

a Dataset R6 object (see ?arrow::open_dataset); use dplyr verbs to query the data and get results, see examples

Examples

Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
#> Warning: structured street name info not found for 1216 ranges

taf()
#> FileSystemDataset with 63 Parquet files
#> 19 columns
#> LINEARID: string
#> FULLNAME: string
#> side: string
#> ZIP: string
#> FROMHN: int32
#> TOHN: int32
#> PARITY: string
#> OFFSET: string
#> street_predirectional: string
#> street_premodifier: string
#> street_pretype: string
#> street_name: string
#> street_posttype: string
#> street_postdirectional: string
#> county_fips: string
#> street_tag_parsed: bool
#> geometry_wkt: string
#> zip3: string
#> zip2: string

# use dplyr verbs to query
library(dplyr, warn.conflicts = FALSE)

# find top ten most frequent street name-posttype combinations
taf() |>
  group_by(street_name, street_posttype) |>
  summarize(
    n_zips = n_distinct(ZIP),
    n_ranges = n(),
    .groups = "drop"
  ) |>
  arrange(desc(n_zips), desc(n_ranges)) |>
  collect() |>
  slice(1:10)
#> # A tibble: 10 × 4
#>    street_name street_posttype n_zips n_ranges
#>    <chr>       <chr>            <int>    <int>
#>  1 Washington  Ave                 11       77
#>  2 Oak         St                  11       64
#>  3 Harrison    Ave                 10      313
#>  4 Galbraith   Rd                   9      242
#>  5 Main        St                   9      152
#>  6 Highland    Ave                  9      111
#>  7 Jefferson   Ave                  9       91
#>  8 Park        Ave                  9       83
#>  9 Ohio        Ave                  9       72
#> 10 Kemper      Rd                   8      231
Sys.setenv("R_USER_DATA_DIR" = tempfile())
taf_install("39061", "2025")
#> Warning: structured street name info not found for 1216 ranges