aiR

aiR is used to assess exposures to ambient particulate matter smaller than 2.5 micrometers (PM2.5) in the Cincinnati, Ohio area. The package creates predictions based on a spatiotemporal hybrid satellite/land use random forest model. Exposure predictions are available as daily averages at a 1 x 1 km grid resolution covering the “seven county” area (OH: Hamilton, Clermont, Warren, Butler; KY: Boone, Kenton, Campbell) from 2000 - 2015:

For example, below are the PM2.5 predictions for June 18th, 2010:

Cross validation of the model indicates that the PM2.5 predictions are excellent across the study domain, with a cross-validated median absolute error of 0.95 micrograms per cube meter and a cross-validated R² of 0.91.

Reference

Please see our publication below for details on the data source, land use forest model, and cross validated accuracy. Please cite this manuscript if you use our package in a scientific publication:

Cole Brokamp, Roman Jandarov, Monir Hossain, Patrick Ryan. Predicting Daily Urban Fine Particulate Matter Concentrations Using Random Forest. Environmental Science & Technology. 52 (7). 4173-4179. 2018.

Installing

aiR is hosted on GitHub; install with:

remotes::install_github('cole-brokamp/aiR')

Example Usage

This example covers how to extract PM2.5 exposure estimates given latitude/longitude coordinates and dates.

Note that pm_grid and pm_data are both R objects that will be available upon loading of the package. However, pm_data has to be split into two smaller files (pm_data_early and pm_data_late) to be under GitHub’s 100 MB filesize limit. This workaround requires binding the two datasets into one upon package loading.

library(aiR)
library(tidyverse)
pm_data <- bind_rows(pm_data_early, pm_data_late)

Create a demonstration dataset of random coordinates and dates:

d <- tibble::tribble(
  ~id,         ~lon,        ~lat,
    809089L, -84.69127387, 39.24710734,
    813233L, -84.47798287, 39.12005904,
    814881L, -84.47123583,  39.2631309,
    799697L, -84.41741798, 39.18541228,
    799698L, -84.41395064, 39.18322447
  )

set.seed(12)

d <- d %>%
    mutate(date = seq.Date(as.Date('2015-01-01'), as.Date('2015-12-31'), by = 1) %>%
             base::sample(size=nrow(d)))

Convert this to a simple features object and transform to the Ohio South projection. Reprojection is necessary because pm_grid is in the Ohio South projection.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3

d <- d %>%
  st_as_sf(coords = c('lon', 'lat'), crs=4326) %>%
    st_transform(3735)

To estimate the exposures, we will first overlay the locations with the PM2.5 exposure grid to generate the pm_grid_id for each location.

( d <- st_join(d, pm_grid) )
#> Simple feature collection with 5 features and 3 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1347996 ymin: 414089 xmax: 1426020 ymax: 466143.5
#> epsg (SRID):    3735
#> proj4string:    +proj=lcc +lat_1=40.03333333333333 +lat_2=38.73333333333333 +lat_0=38 +lon_0=-82.5 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
#> # A tibble: 5 x 4
#>       id date       pm_grid_id                 geometry
#>    <int> <date>     <chr>      <POINT [US_survey_foot]>
#> 1 809089 2015-01-26 3016               (1347996 461745)
#> 2 813233 2015-10-25 4249               (1407370 414089)
#> 3 814881 2015-12-09 2954             (1410421 466143.5)
#> 4 799697 2015-04-08 3687             (1425054 437515.5)
#> 5 799698 2015-03-03 3688               (1426020 436698)

Then merge the “lookup grid” (pm_grid) into the dataset by using pm_grid_id and date. Note that for the merge to work, the date column must be named date and be of class Date and the pm_grid_id column must exist.

( d <- left_join(d, pm_data, by = c('pm_grid_id', 'date')) )
#> Simple feature collection with 5 features and 4 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1347996 ymin: 414089 xmax: 1426020 ymax: 466143.5
#> epsg (SRID):    3735
#> proj4string:    +proj=lcc +lat_1=40.03333333333333 +lat_2=38.73333333333333 +lat_0=38 +lon_0=-82.5 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
#> # A tibble: 5 x 5
#>       id date       pm_grid_id pm_pred                 geometry
#>    <int> <date>     <chr>        <dbl> <POINT [US_survey_foot]>
#> 1 809089 2015-01-26 3016         18.4          (1347996 461745)
#> 2 813233 2015-10-25 4249          6.13         (1407370 414089)
#> 3 814881 2015-12-09 2954         14.9        (1410421 466143.5)
#> 4 799697 2015-04-08 3687          6.68       (1425054 437515.5)
#> 5 799698 2015-03-03 3688         13.9          (1426020 436698)

Adding Weather Data

Several NARR weather variables are available as daily means for the entire study area in the narr_data R object. View the help (?narr_data) to see more details about the individual variables.

To merge in humidity and temperature, we will subset narr_data to those variables and join to our dataset:

( d <- left_join(d, narr_data %>% select(date, air.2m, rhum.2m), by = 'date') )
#> Simple feature collection with 5 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 1347996 ymin: 414089 xmax: 1426020 ymax: 466143.5
#> epsg (SRID):    3735
#> proj4string:    +proj=lcc +lat_1=40.03333333333333 +lat_2=38.73333333333333 +lat_0=38 +lon_0=-82.5 +x_0=600000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=us-ft +no_defs
#> # A tibble: 5 x 7
#>       id date       pm_grid_id pm_pred air.2m rhum.2m
#>    <int> <date>     <chr>        <dbl>  <dbl>   <dbl>
#> 1 809089 2015-01-26 3016         18.4    273.    84.2
#> 2 813233 2015-10-25 4249          6.13   290.    72.2
#> 3 814881 2015-12-09 2954         14.9    280.    86.9
#> 4 799697 2015-04-08 3687          6.68   292.    84.6
#> 5 799698 2015-03-03 3688         13.9    272.    87.9
#> # … with 1 more variable: geometry <POINT [US_survey_foot]>

This site is open source. Improve this page.