Skip to contents

Inside R, metadata lives in the attributes of the data.frame and its columns. We can add and change these with several helper functions used in the example below: add_attrs(), add_col_attrs(), add_type_attrs(). Using these functions to set attributes in R means we can do so reproducibly and changes to the metadata are tracked alongside the R script that creates the data. This prevents a disconnect between data and metadata, but also allows for computing on the metadata to use it to create richer documentation. 1

Creating data

We will create a simple dataset here for this example:

d <-
    id = c("A01", "A02", "A03"),
    date = as.Date(c("2022-07-25", "2018-07-10", "2013-08-15")),
    measure = c(12.8, 13.9, 15.6),
    rating = factor(c("good", "best", "best"), levels = c("good", "better", "best")),
    ranking = as.integer(c(14, 17, 19)),
    impt = c(FALSE, TRUE, TRUE)

Adding metadata properties

When creating a tabular dataset in R, data-specific metadata (i.e., “properties”) can be stored in the attributes of the R object (e.g., a data.frame or tibble).

d <- d |>
    name = "mydata",
    title = "My Data",
    version = "0.1.0",
    homepage = ""

Note that this doesn’t change any of the data values. In R, an object’s attributes are stored with it as a list. Some attributes (?attributes) are treated specially by R (e.g., class, names, row.names, comment) and usually shouldn’t be modified. Although all attributes (including the ones we added above) are available as a list (?attributes), we can use a function to extract only the attributes that represent metadata descriptors as a tibble.

glimpse_attr(d) |>
name value
name mydata
version 0.1.0
title My Data

Adding column-specific metadata properties

Similarly, we can add column-specific attributes (i.e., “schema”). These metadata functions follow the tidy design principles, making it simple to expressively and concisely add metadata using pipes:

d <-
  d |>
  add_col_attrs(id, title = "Identifier", description = "unique identifier") |>
  add_col_attrs(date, title = "Date", description = "date of observation") |>
  add_col_attrs(measure, title = "Measure", description = "measured quantity") |>
  add_col_attrs(rating, title = "Rating", description = "ordered ranking of observation") |>
  add_col_attrs(ranking, title = "Ranking", description = "rank of the observation") |>
  add_col_attrs(impt, title = "Important", description = "true if this observation is important")

Adding column-specific metadata properties based on R classes

Automatically add name, type and enum schema to each column in the data based on their class:

Like for descriptors, there is a helper function to retrieve schema as a tibble:

options(knitr.kable.NA = "")
glimpse_schema(d) |>
name title description type constraints
id Identifier unique identifier string
date Date date of observation date
measure Measure measured quantity number
rating Rating ordered ranking of observation string good, better, best
ranking Ranking rank of the observation integer
impt Important true if this observation is important boolean

See Reading and Writing Tabular Data Resources for details on how to save the tabular-data-resource to disk.