PurpleAir API Dashboard Vignette

A RISE Communities Program Presentation

Author

Stephen Colegate

Published

August 9, 2024

1 Introduction

Air quality sensors have the potential to provide high spatial and temporal resolution data and their accessibility in terms of cost and ease of use (Collier-Oxandale et al. 2022). Open access to environmental data sets and related tools is possible through a stable and consistent Application Programming Interface (API) (Callahan et al. 2023) that allows software and application developers to build applications to display and report that data in transparent and meaningful ways (Feenstra et al. 2020).

In this document, we will present methods of accessing synoptic and time series data using R software with the PurpleAir package from the PurpleAir API dashboard interface (Brokamp 2024). We will then explore how to visualize both the spatial (in the form of maps) and temporal (in the form of time series plots) to determine air pollution trends.

Note

This rendered HTML of all the code and relevant output is hosted online on our GitHub page. All the R code in this vignette is contained in the purpleair.R file there. Instructions are provided in Section 1.1.2 to download this file directly from GitHub. As you follow along with this tutorial, try running all the available R code in the purpleair.R file to learn how to read in your PurpleAir data and do some basic data analysis with it.

1.1 Setup

Throughout this tutorial, we assume that you already have R and RStudio installed on your computer and know the basics with how to use the R programming language. You can install R on your computer by going to https://cran.r-project.org/ and selecting the R version that is appropriate for your operating system (R Core Team 2013). After installing R, you will then need to install RStudio Desktop - an integrated development environment (IDE) to help data scientists be more productive with R (RStudio Team 2020).

This section describes how to set up your R session to prepare to work with the necessary packages, functions, and files that will be created, loaded and saved to your computer.

Important

It is highly recommended that you are somewhat familiar with using R and RStudio before continuing on with this vignette. If you want to learn more about the basics of R and RStudio, the R tutorial vignette walks through everything you need to know before you begin.

1.1.1 New R Project

First open RStudio on your computer. At the top left, click File > New Project. A dialogue box should appear like the one shown below:

Create a new R project within RStudio.
Tip

If you already have the folder created where all project files will be kept, you can reference this folder by selecting Existing Directory. This loads a navigation box where you can select your folder.

Select a folder that already exists to house the R project.

Select New Directory from the message box. This brings up a list of various project types. Select New Project.

Click New Project to create a new R project.

On the next screen (shown below), enter in the name of the project (e.g. PurpleAir Demo). Beneath the directory name, browse the location on your computer where to host the R project. This will create a subdirectory folder with the name of the project inside. All files created, saved, and loaded throughout this vignette will come from this folder.

Type in the project name and select where to create the folder for the project.

Leave all the other fields untouched. Click Create Project to start the new R project. RStudio then restarts the current R session inside of the project folder.

Tip

You can verify the R project you are currently in by looking for the project name in the top-right corner of RStudio. Clicking on this icon brings up a menu to create new R projects, close the existing project, or switch to a new project.

1.1.2 Download Files

After creating a new R project, download the necessary R files from our GitHub page. First, download the purpleair.R file. Copy and paste the following line of code below into the console. To copy this code block, look for the clipboard icon located in the top-right corner of the code block and click Copy to Clipboard.

# Download 'purpleair.R' file from GitHub
download.file("https://raw.githubusercontent.com/geomarker-io/purple_air_data_in_R/main/purpleair.R", destfile = "purpleair.R")

Inside RStudio, in the Console Pane, paste this code at the command prompt > and press ENTER. R should then go to our GitHub page, look for the purpleair.R file, and download the file into your R project folder. Verify the file has been downloaded by clicking on the Files tab in the Output Pane. Click on the purpleair.R file link inside the Files tab to open the R file inside RStudio. The purpleair.R file should now be included in the Source Pane.

Tip

All the R functions that are given in this vignette are included in the purpleair.R file. Once you have the R file downloaded and opened in a RStudio session, you do not need to type in any more code, as all the code stated here is reproduced in purpleair.R. Simply highlight the code and execute it by clicking Run or use the SHIFT+ENTER keyboard shortcut.

1.1.3 R Packages

You will need to install the PurpleAir package in order to use the functions that are provided later in this vignette. R packages are stored on the Comprehensive R Archive Network (CRAN) and require a internet connection to download. Learn more about the PurpleAir package by clicking here.

Packages are created by R users who bundle their functions, data, documentation, and other information that can then be shared and downloaded by other R users. Packages must be downloaded and installed first before they can be used.

Tip

A banner, like the one shown below, will appear in RStudio if there are packages that are being used that are not installed in the Users Library.

The usethis package is required but is currently not installed, with the option to install the missing package.

Click Install to download and install the missing packages.

To install the PurpleAir package, use the install.packages() function below.

# Install the PurpleAir package - only need to run once
install.packages('PurpleAir')
Important

Packages need to only be installed one time. After a package has been installed, there is no reason to install the package again unless there is an updated version for it.

Caution

You may need to restart the R session in order for these packages to successfully install (especially if the packages are already loaded). If this is the case, select Yes when prompted to restart the R session. You may have to rerun the R code chunks again after restarting the session.

Downloading the PurpleAir R package to the User’s Library.

When this line of code is executed, RStudio goes to the CRAN and looks for a package called PurpleAir and begins installing the package automatically in your Users Library. Click on the Packages tab in the Output Pane and locate the PurpleAir package (along with several other packages) that have been installed.

Tip

R already comes with its own packages (e.g., data sets, graphics, etc.). You do not need to install these packages, as they are already preinstalled when R is first installed. You can view all the R packages installed on your computer by clicking on the Packages tab in RStudio on the Output Pane.

Warning

It can take several minutes for all the packages to download and install on your computer. Do not run any code or stop the process by clicking on the stop sign icon in the top right of the Console Pane. Doing so can cause problems with the installation.

Once the package has been installed, you must then load each package into the R environment by using the function library():

# Load the PurpleAir package
library(PurpleAir)
Tip

A package only needs to be loaded in once per R session. Reloading a package in the same R session will have no effect. You can check whether a package is successfully loaded by clicking on the Packages tab in the Output Pane. You can also load a package by selecting the open box next to the package name, browse package information on CRAN, and delete the package altogether from this list.

List of Packages installed in the Output Pane. Loaded packages are indicated with a check mark.

Loading and unloading packages can be accomplished here by clicking on the check mark box.

Caution

All open packages are unloaded when RStudio is closed or a new R session begins. Unlike the installation process where the packages are installed only once, you must load these packages every time you start a new R session. When you quit your R session and start a fresh session, you must reload the R packages again.

Some R packages that will be useful are the tidyverse, sf and usethis packages. Install these packages using the install.packages() function by specifying the names of these packages as a list, using the c() notation:

# Install the tidyverse and sf packages - only need to run once
install.packages(c('tidyverse', 'sf', 'usethis'))

Once the package is installed, load these packages into the R environment using library():

# Load required R packages
library(tidyverse)
library(sf)
library(usethis)
Note

Some R packages depend on items that are handled by other R packages in order to function properly. These extra dependencies will be installed automatically if they are not present in the User Library. For example, installing the tidyverse package will also install (and load) the following other packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, and tidyr.

Caution

Sometimes, R will print a warning message if a package that is loaded could conflict with another package that is already loaded, like the one shown below:

Function conflicts when loading the tidyverse package.

This scenario appears if functions that share the same name but perform different actions from two or more packages. Functions from the latter package will mask functions that are included in packages loaded earlier by default. For example, the filter() function appears in both the dplyr package and the stats package. The stats package is loaded automatically when a new R session begins. When the dplyr package (a package that the tidyverse package depends on, or a dependency) is loaded into the session, this conflicts informs you that the fitler() function refers to the function in the dplyr package, not the stats package.

Tip

You can refer to functions included with an R package without having to load the package itself. Use the :: operator to refer to the package and the function with which you wish to use. From the previous caution box, dplyr::filter() refers to the filter() function in the dplyr package. This is handy if you require a specific function from a R package, but make sure you have the package installed first!

2 PurpleAir API Interface

You can download the latest available PurpleAir sensor data from the PurpleAir website within your R environment. This method requires the user to create an account with PurpleAir and obtain an Application Programming Interface (API) key unique to the user to make specific queries.

The PurpleAir API dashboard allows users to create and manage their API keys and their usage. The dashboard requires a Gmail or Google-associated account to sign in. The instructions below follow the tutorial provided by PurpleAir.

Note

You will be notified through your Gmail or Google-associated account whenever you make on the PurpleAir API dashboard, such as creating a project, archiving a project, creating an API Key, and making a query.

2.1 Create Project

The PurpleAir API dashboard for creating projects.

First create a project by following these steps:

  1. Sign in to develop.purpleair.com using a Gmail or Google-associated account.

  2. Click Projects along the top of the page.

  3. Click on +Project in the top right-hand corner of the page to add a new project.

    Creating a project in the PurpleAir API dashboard.
  4. Enter in a project name.

  5. Click Create.

Once a project has been created, it is then added to the Projects list. You can also archive old projects once you are finished with a project. Archived projects are stored under the Archived projects tab.

2.2 Obtain API Key

List of active API Keys on the PurpleAir dashboard.

To obtain your unique API key follow these steps:

  1. Sign in to develop.purpleair.com using a Gmail or Google-associated account.

  2. Click API Keys along the top of the page.

  3. Click on + API Key in the top right-hand corner of the page.

  4. A window appears on the right of the webpage. You should see the auto-populated project name you created earlier (Section 2.1). You can have several projects and several API keys at a time. Select the appropriate project if you have multiple projects.

  5. Select Read under “Choose key type”, select Enabled under “Choose status” , and leave all the other fields blank.

  6. Click Create.

You should then see your unique API key for the project. You can associate multiple API keys to one project.

Warning

API keys are issued per user, not per sensor. Make sure that you have enough points allocated for the query before running the code. You do not want to waste points by running and downloading data that you may already have downloaded.

2.3 Set API Key

The API Key is an access token that allows you to pull PurpleAir sensor data from the cloud into your R session environment (see Section 2.2 to learn how to create API Keys). Each user begins with 1,000,000 points for free. Each time you tell R to fetch PurpleAir data from the PurpleAir API, points will be deducted from your account. The number of points that will be deducted depends on how much data that you request when you query the API. The more data that you request, the more points that you will use.

Warning

You must have enough points in your account to be able to complete a query from the PurpleAir API. If you do not have enough points to complete the data query, an error message will appear in R stating that there’s not enough points to complete the action. You can purchase additional points through the PurpleAir API interface dashboard online. Click the Purchase Points along the top-right of the webpage and follow the instructions to add additional points to your account. Different rates are available depending on the purchase price.

Purchasing and adding points to a PurpleAir API account.

It is not ideal to leave your API Key in a R script. If you share any of these documents that include your personal API Key with someone else, when they go to request PurpleAir sensor data, they will be using your points! A better solution is to save your API Key in a separate location that keeps the key private and then reference the key whenever you need it.

A type of file that handles authentic requests is the .Renviron filetype. The .Renviron filetype works just like a R script, except this file does not appear in the working directory. When a new R session begins, either by opening RStudio or by restarting the R session manually, R will check the contents of the .Renviron file and source from it into the global environment. This means that any variables declared in a .Renviron file will be available in the global environment but will be hidden from the Environment Pane and the user’s current working directory. This setup is useful for API credentials, since the user is able to hide their API key while being able to use them within an R session as environment variables. Click here to learn more about setting up API credentials.

Follow these steps to setup a secure method of loading your API Key into your R environment:

  1. Open the .Renviron file in RStudio by running the following code:

    # Run this to open .Renviron file
    usethis::edit_r_environ()

    The edit_r_environ() function from the usethis package will open a tab named .Renviron in the Source Pane (see next image below).

  2. Copy and paste the following R code into the .Renviron file that just opened. Click on the clipboard icon on the top right of the code block inside the vignette to copy the line of code.

    PURPLE_AIR_API_KEY = "PASTE_API_KEY_HERE"

    The .Renviron file when opened in RStudio. Copy the line of code provided above into this file (Step 2). Replace PASTE_API_KEY_HERE with your API Key in quotes from the PurpleAir API dashboard (Step 3).
  3. Once the above code has been pasted into the .Renviron file, replace PASTE_API_KEY_HERE with your unique PurpleAir API Key in quotes associated with the project from the PurpleAir API dashboard website. Click on API Keys along the top of the webpage to list all your active API Keys. Under the API Key you wish to copy, click on : and then click Copy Key to copy the API Key to the clipboard. Paste the API Key in the .Renviron file where the text PASTE_API_KEY_HERE is. Ensure that your API Key is in quotes. When the API Key is referenced within your R environment, it will be referenced as the variable PURPLE_AIR_API_KEY. Save the .Renviron file and then close the tab.

    Caution

    Your PurpleAir API Key must be in single-quotes or double-quotes for this to work. API Keys are recognized in R as character strings. In RStudio, character strings are written in a forest green color to easily identify character strings.

  4. Restart the R session by clicking Session > Restart R. You may be prompted to save any unsaved changed you have made to either the purpleair.R file or the .Renviron file. Reload the R packages PurpleAir, sf, tidyverse, and usethis packages.

    Caution

    Make sure that you load the PurpleAir, sf, tidyverse, and usethis packages by using the library() function, as discussed in Section 1.1.3. The following R commands use functions from these packages, and unless the packages are loaded again, R will not be able to find these functions.

  5. When a new R session begins (either by restarting R or by closing and opening RStudio), R will check the contents of the .Renviron file and set the variable name PURPLE_AIR_API_KEY as the API Key you pasted in from the PurpleAir API dashboard. Verify that your PurpleAir API Key has been set correctly by running the following code:

    # Check that your API Key has been set correctly
    check_api_key(Sys.getenv("PURPLE_AIR_API_KEY"))
    ✔ Using valid 'READ' key with version V1.0.14-0.0.58 of the PurpleAir API on 1723218477

    The Sys.getenv() function pulls the variable PURPLE_AIR_API_KEY that has been assigned your API Key in Step 3. The check_api_key() function then takes your API Key and verifies with the PurpleAir API dashboard that it is a valid key. If your API Key is set correctly, you should get a successful valid message like the one above.

    Caution

    The API Key that you pasted in the .Renviron file will still work even if you are outside the R project. Be careful that any variables that you paste inside the .Renviron file are not also used in the current R script.

Note

This process only needs to be completed once for each API Key that you use. That is, the PurpleAir API Key that you pasted in the .Renviron file will be set every time a new R session begins within the R project. If you wish to use a different API Key, repeat Steps 1-5, replacing the old API Key in Step 3 with the new API Key.

Click here to learn more about how to use the API dashboard. Click here to learn more about how to set up and use API Keys with the PurpleAir API dashboard.

3 PurpleAir Data Exploration

PurpleAir sensor readings are uploaded to the cloud every 120 seconds. (Every 80 seconds prior to a May 31, 2019 firmware upgrade.) Data are processed by PurpleAir and a version of the data is displayed on the PurpleAir website. In this section, we will query the PurpleAir API to download sensor data into the R environment.

Warning

You must have a valid PurpleAir API Key in order to query PurpleAir sensor data from the PurpleAir API dashboard. The API Key must be validated before requesting PurpleAir data. Please read Section 2.2 to obtain a valid API Key and Section 2.3 to set up the key. You must also have enough points in your account to query the PurpleAir API dashboard. Remember that points are tied to each user account, not to API Keys. While multiple API Keys can be created by one user, those API Keys will all draw from same points allocated to that account.

3.1 Single Sensor Data

This exercise will pull the latest observations from a single PurpleAir sensor. In order to download data from a PurpleAir sensor, a specific sensor must first be identified. The sensor_index is a unique identifier of a PurpleAir sensor that can be obtained from the PurpleAir API dashboard. Learn more about how to obtain the sensor_index of a particular sensor by clicking here. The easiest way to identify the sensor_index of the PurpleAir sensor you wish to analyze is by using the PurpleAir Real-Time Air Quality Map.

The PurpleAir Real-Time Air Quality Map showcasing all the latest data from PurpleAir sensors.

First locate a PurpleAir sensor you wish to analyze by going to where the sensor is located on the map. In this example, the PurpleAir sensor from the Cincinnati Fire Department Station #12 in located on 3001 Spring Grove Ave, Cincinnati, OH 45225 in Camp Washington is identified.

The PurpleAir sensor from Cincinnati Fire Department Station #12 in Camp Washington.

Click the icon of the reading from the sensor to examine both current and historical trends. Inside the pop-up box for the sensor, hover you mouse over Get This Widget. Inside the id field is the sensor_index, which is highlighted in the picture below.

The sensor_index is listed in the id field under the Get This Widget icon.

From the image above, the sensor_index for the PurpleAir sensor at Station #12 is 176557. Alternatively, hover your mouse over the Get This Widget icon and click Download Data. A new webpage should open. Check the URL of the webpage that just opened and the sensor_index should be listed at the tail-end of the URL.

The sensor_index is provided at the end of the URL that opens when you click Download Data under the Get This Widget icon.

There are many fields that you can choose from when requesting data from PurpleAir. Click here to read the documentation on all the available information that you can query from each sensor. The sensor_index will be needed to identify a particular sensor when the PurpleAir API is queried. The get_sensor_data() function from the PurpleAir package can retrieve information from the PurpleAir API for a single sensor with a provided sensor_index. Inside the function, the sensor_index is specified for the specific sensor. The option fields is a list of variables that are requested from this sensor. The fields option is specified as a list (indicated by c()) where each variable name is a character string (i.e. in quotes) separated by commas.

# Get latest data from a single sensor
sensor_data <- get_sensor_data(sensor_index = 176557,
                               fields = c("name", "last_seen",
                                          "pm2.5_cf_1", "pm2.5_atm"))
sensor_data
Note

If you receive an error like the one below, chances are, the necessary PurpleAir package that contains the get_sensor_data() function has not been loaded in the current R session. Make sure to run library(PurpleAir) during each new R session to ensure the R package is loaded before running the get_sensor_data() function. See Section 1.1.3 for more details on installing and loading R packages.

This error message appears when the PurpleAir R package is not loaded.
$last_seen
[1] "2024-08-02 23:07:11 EDT"

$name
[1] "CFD Station 12"

$pm2.5_atm
[1] 7.5

$pm2.5_cf_1
[1] 7.5

When this code is executed, points are deducted from your API account to retrieve the requested data and then stored in an R data object called sensor_data. The information that is returned is given by what was listed in the fields option. In this specific example, the following items are returned:

  • "name": Identifies the name of the sensor with sensor_index of 175413.

  • "last_seen": Indicates the timestamp of the latest available information that the PurpleAir API has of this particular sensor.

  • "pm2.5_cf_1": The recorded PM2.5 observation using the CF=1 formula.

  • "pm2.5_atm": The recorded PM2.5 observation using the CF=ATM formula. To learn more about the difference between the CF=1 and CF=ATM formulas, click here.

Tip

Consult this website to look at other fields that you can specify and retrieve. Some of the fields that you can specify include PM1.0, PM10, PM2.5, temperature, humidity, atmospheric pressure, O3 (ozone) and VOC concentration. Just make sure to include the name of the field you want in the list in quotes.

Note

By default, the PurpleAir R package retries failed API requests related to an underlying HTTP error (for example, the network or website is down) or a transient API error (i.e., 429, 503). Before retrying each failed request, the PurpleAir package will wait approximately two seconds before trying the request again. Successive failed requests result in exponentially longer waiting times (set using the function httr2::req_retry()). It is possible to specify the maximum number of seconds to wait (by default 45) with the environment variable PURPLE_AIR_API_RETRY_MAX_TIME.

3.2 Multiple Sensor Data

A list in R is specified with c(), with elements separated by commas. In the previous R code, all the variables that we wish to retrieve were in quotes separated by commas inside c() for the fields option within the get_sensor_data() function. A list can be passed within the get_sensors_data() function to retrieve data from multiple PurpleAir sensors.

Caution

Note that get_sensor_data() and get_sensors_data() are two different functions (notice the s on the word sensor on the latter function). The get_sensor_data() retrieves information only from a single sensor, while the get_sensors_data() retrieves information from multiple sensors. Be aware of the function that is specified, as there are different options that are available between the functions.

Specify the sensor_index in the option x as a list, like in the example below:

# Get latest data from multiple sensors
multiple_data <- get_sensors_data(x = c(176557, 184705, 177011),
                                  fields = c("name", "last_seen",
                                             "pm2.5_cf_1", "pm2.5_atm"))
multiple_data
# A tibble: 3 × 5
  sensor_index last_seen           name                     pm2.5_atm pm2.5_cf_1
         <int> <dttm>              <chr>                        <dbl>      <dbl>
1       176557 2024-08-02 23:05:11 CFD Station 12                 8.5        8.5
2       177011 2024-08-02 23:08:07 Trinity Episcopal Church       5.5        5.5
3       184705 2024-08-02 23:07:48 Cincy Air Watch - Zoo          7.9        7.9

In the above example, the latest sensor data of "name", "last_seen", "pm2.5_cf_1" and "pm2.5_atm" will be retrieved for the following PurpleAir sensors:

3.3 Sensors in a Specified Area

In the previous section, the sensor_index of every sensor must be specified as a list. It is also possible to simply specify a area on a map and retrieve the available sensor information from that area. A boundary box is a rectangular box that is drawn on a map, defined by latitude and longitude coordinate pairs of its four corners. The idea is to go to the area of interest on the map, draw the boundary box that covers the desired area, and identify the minimum and maximum latitude and longitude coordinates that make up the boundary box. A free tool that can help identify the coordinates of a boundary box on a map is OpenStreetMap.

  1. Go to the OpenStreetMap website and zoom in to the area of interest on the map. In this example, we will examine the area north of Downtown Cincinnati, including Over-the-Rhine, the West End, Mount Auburn, Mount Adams and Pendleton.

  2. Click on Export at the top of the webpage. A pane opens on the left side of the screen, indicating the coordinates of the boundary box that covers the entire view that is currently visible on the webpage.

  3. Click on Manually select a different area to bring up a boundary box on-screen.

  4. Adjust the boundary box on-screen by clicking and dragging the four corners of the box so that it completely covers the area of interest.

    A boundary box drawn north of Cincinnati, Ohio in OpenStreetMap.
  5. The four numbers on the left panel under Export identify the four coordinates that are needed to define the boundary box:

    • xmin: The left number.

    • xmax: The right number.

    • ymin: The bottom number.

    • ymax: The top number.

A function in the sf package, st_bbox(), identifies how the boundary box will be drawn. Highlight the corresponding coordinates from the OpenStreetMap website and paste them as a list, specifying the coordinate, as in the example below:

# Get sensor information from a boundary box
boundary_data <- sf::st_bbox(c("xmin" = -84.5320, "ymin" = 39.0978,
                               "xmax" = -84.5003, "ymax" = 39.1181),
                             crs = 4326) |>
  get_sensors_data(fields = c("name", "last_seen",
                              "pm2.5_cf_1", "pm2.5_atm"))
boundary_data
# A tibble: 54 × 5
   sensor_index last_seen           name               pm2.5_atm pm2.5_cf_1
          <int> <dttm>              <chr>                  <dbl>      <dbl>
 1        30561 2024-08-02 23:07:15 lineblock_outside2      22.9       22.9
 2        30571 2024-08-02 23:06:51 amlok_inside             3.4        3.4
 3        35107 2024-08-02 23:08:27 807D3A615CCF            12         12  
 4        35109 2024-08-02 23:07:39 807D3A615D8E            13.8       13.8
 5        35121 2024-08-02 23:07:36 807D3A615C42            14.7       14.7
 6        35137 2024-08-02 23:08:20 807D3A615C77            14         14  
 7        35189 2024-08-02 23:06:43 807D3A6152E5            12.7       12.7
 8        36325 2024-08-02 23:07:26 807d3a616167            31.3       32.8
 9        36681 2024-08-02 23:07:26 68c63a8e59a              5.5        5.5
10        42623 2024-08-02 23:07:35 KMI_041                  0          0  
# ℹ 44 more rows

The crs=4326 option inside the st_bbox() indicates the coordinate system is a grid of latitude and longitude pairs. Once the boundary box has been defined, this information is then passed on to the get_sensors_data() function (using the |> operator) and then retrieves all the sensor data within that boundary box for the fields specified.

3.4 Historical Sensor Data

The PurpleAir API dashboard collects and stores all the information reported from these PurpleAir sensors into the cloud. So far, we have been querying selected data from specified sensor data using the latest information that is available. It is also possible to pull historical data from these sensors to examine patterns or trends being recorded over time.

Let’s return to the PurpleAir sensor at the Cincinnati Fire Department Station #12, whose sensor_index is 176557. It would be interesting to see air pollution readings over the 4th of July weekend, where people are celebrating Independence Day by setting off fireworks. The get_sensor_history() function in the PurpleAir package can return all the various observations that were recorded from a specified sensor between two time stamps. Since the PurpleAir sensors makes frequent measurements, it is wise to save this (very large) information as a data object called sensor_history so that multiple large API calls are made.

# Get historical data from a sensor from July 3-6, 2024
sensor_history <- get_sensor_history(sensor_index = 176557,
                                     fields = c("pm1.0_cf_1", "pm1.0_atm",
                                                "pm2.5_cf_1", "pm2.5_atm"),
                                     start_timestamp = as.POSIXct("2024-07-03"),
                                     end_timestamp = as.POSIXct("2024-07-06") )
sensor_history
# A tibble: 432 × 5
   time_stamp          pm1.0_cf_1 pm1.0_atm pm2.5_atm pm2.5_cf_1
   <dttm>                   <dbl>     <dbl>     <dbl>      <dbl>
 1 2024-07-04 11:50:00      12.4      12.4       22.8       22.8
 2 2024-07-04 01:10:00      10.6      10.6       17.3       17.3
 3 2024-07-04 04:00:00       6.74      6.74      10.8       10.8
 4 2024-07-04 15:30:00      11.4      11.4       19.2       19.2
 5 2024-07-04 14:50:00      11.8      11.8       19.3       19.3
 6 2024-07-04 00:30:00      10.7      10.7       17.5       17.5
 7 2024-07-04 02:20:00       8.76      8.76      14.5       14.5
 8 2024-07-04 01:30:00       8.29      8.29      13.1       13.1
 9 2024-07-03 21:50:00      12.6      12.6       20.2       20.2
10 2024-07-03 20:50:00       9.56      9.56      15.2       15.2
# ℹ 422 more rows

The sensor_index is specified in the get_sensor_history() function that acknowledges the specific sensor, and the fields option returns the given information to be returned. The start_timestamp and end_timestamp options specify the start and end time stamps of historical data to be viewed. The function as.POSIXct() converts the special character string as a calendar date. The usual format for date character strings are the four-digit year, two-digit month, and two-digit date, separated by hyphens. All times begin at midnight UTC on the date specified by default, but it is also possible to include the hour, minute and second as well if only certain time periods are desired.

All the historical data from this sensor is stored as a data object called sensor_history. By typing in the name of this object and running it inside R, we can see the first several rows of the data object. This data can be used to do analysis and to create plots. For example, using the historical data from sensor_history, a time-series plot of PM2.5 can be created to see how PM2.5 values fluctuated over the course of the Fourth of July period at this location:

# Get a time-series plot of sensor readings
sensor_history |>
  tidyr::pivot_longer(cols = tidyr::starts_with("pm"),
                      names_to = "pollutant", values_to = "concentration") |>
  ggplot2::ggplot(ggplot2::aes(time_stamp, concentration, color = pollutant)) +
  ggplot2::geom_line()

Here’s a summary of what the above code does, in this order:

  1. The pivot_longer() function from the tidyr package reorganizes the sensor_history data to accommodate the plotting functions. The cols option takes all the columns that start_with() the characters pm (pm1.0_cf_1, pm1.0_atm, pm2.5_atm, and pm2.5_cf_1) and creates four groupings, the name of each group is the column name. A new column pollutant is created that identifies the pollutant group, which is one of the four provided. The value of that pollutant is given as another column concentraton. The reorganization is done so that each row identifies a concentration of one of the four PM pollutants.
  2. The ggplot() function from the ggplot2 package takes the reorganized data from Step 1 and creates the time series plot. The aes() specifies what goes on the x-axis (time_stamp - the time the observation was made), the y-axis (concentration - the PM concentration observed) and how four lines are colored (color=pollutant indicates which pollutant the observed concentration belongs to).
  3. The geom_line() from the ggplot2() package takes the information specified in Step 2 and draws the time series plot as lines. Since there are four pollutants provided in the pollutant column (as indicated by the color option), there will be four lines drawn.

3.5 Saving and Loading Data

Each time the PurpleAir API dashboard is requested, points are spent to download the data. For the duration of the R session, the data objects sensor_data, multiple_data, boundary_data, and sensor_history include information from the PurpleAir API dashboard. When you quit RStudio, any packages that are loaded become unloaded, any results that are not saved are erased, and all variables and data objects within the R environment get deleted. This becomes a problem, since restarting R will require the PurpleAir API dashboard to be called again to get back these data objects. Having to redownload data that you have already queried is not practical since this will use up a lot of points on your PurpleAir account.

A solution to this problem is to save each data object as a .RDS file. The .RDS (R Data Serialization) file format is just one of many popular data formats to save and read in data files. The function saveRDS() saves a data object into a .RDS file. The corresponding function readRDS() function reads a .RDS file into the R environment. The file path to the .RDS file can be specified in both functions, where at least the name of the .RDS file given to locate the correct file. By default, the .RDS files are saved in the same directory as the project folder.

The following code saves each of the sensor_data, multiple_data, boundary_data and sensor_history data objects into .RDS files with corresponding names:

# Save each data object for analysis later
saveRDS(sensor_data, file="sensor_data.RDS")
saveRDS(multiple_data, file="multiple_data.RDS")
saveRDS(boundary_data, file="boundary_data.RDS")
saveRDS(sensor_history, file="sensor_history.RDS")
Tip

Include a different file path aside from the current working directory if you wish to save these files in a different location on your computer. Just remember to use this same file path in the readRDS() function as the saveRDS() function.

Check the working folder directory on your computer to find that four new .RDS files have been created inside the folder. The data that have been requested from the PurpleAir API dashboard are now safely saved onto your computer for later. At this point, it is save to quit out of RStudio and end the R session without losing the data files. When RStudio is opened and a new R session begins, the data objects can be reloaded back into the R environment by reading in the .RDS files. Use the readRDS() function to read back in the PurpleAir sensor data in a new R session:

# Load each data object after a new R session
sensor_data <- readRDS(file="sensor_data.RDS")
multiple_data <- readRDS(file="multiple_data.RDS")
boundary_data <- readRDS(file="boundary_data.RDS")
sensor_history <- readRDS(file="sensor_history.RDS")
Tip

This method is very convenient whenever you need to quit out of RStudio and analyze the data later, since only the above code needs to be run to load the previous sensor data. This prevents having to call the PurpleAir API dashboard each time a new R session begins, saving precious points and time.

Caution

Make sure that the file path specified in the readRDS() function is correctly specified. If the location of the .RDS file(s) changes, the file path must be updated to reflect the new location. Otherwise, R will be unable to locate the .RDS file(s).

4 Conclusions

The purpose of this vignette is to introduce how to communicate between R and the PurpleAir API dashboard to import data from the PurpleAir cloud servers into R and do basic data analysis. We have shown the basics of querying the data, from getting the latest information from PurpleAir sensors to obtaining historical data to analyzing several sensors over a specific region. It is also possible to retrieve information from other sources aside from PurpleAir such as OpenAQ. Once the data has been requested and downloaded into the R environment, a wide range of data analysis becomes possible, from exploring the data to identify things of interest to studying relationships and long-term trends in air pollution exposure. With the ability to introduce low-cost PurpleAir sensors to communities who are most affected by the result of ambient air pollution exposure, it is now possible to understand the science behind trends in certain areas. As the number of these low-costs sensors becomes available and grows with time, the data that will be collected for analysis will become large. We hope this vignette helps you understand how the data from these PurpleAir sensors are collected and used for analysis that will help promote change and awareness on the effects of ambient air pollution exposure in community neighborhoods.

5 Acknowledgements

Thanks to Dr. Cole Brokamp for creating the PurpleAir R package used in this vignette. Thank you to Erika Manning, Andrew Vancil, Qing Duan, and Carson Hartlage for proofreading the vignette and for offering suggestions to improve the vignette. Special thanks to Dr. Daniel Hargraves and Dr. Patrick Ryan for allowing me to speak at the RISE Communities Program at Cincinnati Children’s Hospital Medical Center. Finally, huge thanks to PurpleAir Founder and CEO Adrian Dybwad for his guidance and input during the 2023 RISE Communities Program to improve the PurpleAir API dashboard and for his input in making retrieving PurpleAir sensor data in R now easier than ever.

If you have any comments or suggestions on ways that this tutorial can be improved, I’d love to hear from you! Please email me your feedback at stephen.colegate@cchmc.org.

6 References

Brokamp, Cole. 2024. PurpleAir: Query the PurpleAir Application Programming Interface. https://CRAN.R-project.org/package=PurpleAir.
Callahan, Jonathan, Hans Martin, Kayleigh Wilson, Tate Brasel, and Helen Miller. 2023. “AirSensor: Process and Display Data from Air Quality Sensors.” https://CRAN.R-project.org/package=AirSensor.
Collier-Oxandale, Ashley, Brandon Feenstra, Vasileios Papapostolou, and Andrea Polidori. 2022. AirSensor V1.0: Enhancements to the Open-Source R Package to Enable Deep Understanding of the Long-Term Performance and Reliability of PurpleAir Sensors.” Environmental Modelling & Software 148 (February): 105256. https://doi.org/10.1016/j.envsoft.2021.105256.
Feenstra, Brandon, Ashley Collier-Oxandale, Vasileios Papapostolou, David Cocker, and Andrea Polidori. 2020. “The AirSensor Open-Source R-Package and DataViewer Web Application for Interpreting Community Data Collected by Low-Cost Sensor Networks.” Environmental Modelling & Software 134 (December): 104832. https://doi.org/10.1016/j.envsoft.2020.104832.
R Core Team, R. 2013. “R: A Language and Environment for Statistical Computing.”
RStudio Team. 2020. RStudio: Integrated Development Environment for r. Boston, MA: RStudio, PBC. http://www.rstudio.com/.