Using R to Build a Community Data Explorer for Cincinnati (CoDEC)

CCHMC R Users Group

Cole Brokamp, Erika Manning, Andrew Vancil

5/10/23

👋 Welcome

Join the RUG Outlook group for updates and events. {width=180%}

📣 BUG & RUG present:

Bilingual Data Science Meeting

July 13, 2023

In-person at S1.203

About BUG: The Biomedical Informatics Users Group (BUG) is a community of bioinformatics researchers and data scientists looking to sharing knowledge, insights, and build community. We organized a series of user-led, informal talks and discussions by and for data researchers at CCHMC and UC. Please contact Krishna Roskin at krishna.roskin@cchmc.org for more details or to sign up to present at a future BUGs meeting.

Using R to Build a Community Data Explorer for Cincinnati (CoDEC)

  1. Introduction to CoDEC
  2. Sharing CoDEC Data
  3. Exploring CoDEC Data

Background

The White House’s Equitable Data Working Group1:

  • Equitable data are “those that allow for rigorous assessment of the extent to which government programs and policies yield consistently fair, just, and impartial treatment of all individuals.”
  • Equitable data should “illuminate opportunities for targeted actions that will result in demonstrably improved outcomes for underserved communities.”
  • Make disaggregated data the norm while being “… intentional about when data are collected and shared, as well as how data are protected so as not to exacerbate the vulnerability of members of underserved communities, many of whom face the heightened risk of harm if their privacy is not protected.”

Disaggregation

  • Open data can fall short of driving action if it is not equitable.

  • Disaggregating1 data by sensitive attributes, like race and ethnicity, can elucidate inequities that would otherwise remain hidden.

Open data is necessary and not sufficient to drive the type of action that we need to create a more equitable society.

— The U.S. Chief Data Scientist, Denice Ross2

Privacy

  • Data are people1
  • Privacy is a spectrum of the tradeoffs between risks and benefits to individuals and populations
  • Data collected at the individual-level by one organization often cannot be shared2 with another organization due to legal restrictions or organization-specific data governance policies
  • Community-level (e.g. neighborhood, census tract, ZIP code) data disaggregated by gender, race, or other sensitive attributes
  • Achieving data harmonization upstream of storage allows for contribution of disaggregated, community-level data without disclosing individual-level data when sharing across organizations

The TRUST principles for digital repositories1

Creating and maintaining an open community-level data resource equips the entire community for data-powered decision making and boosts organizational trustworthiness. Demonstrating reliability and capability of appropriately managing shared data helps earn the trust of organizations and communities intended to be served:

  • 🤲 transparent: make specific repository services and data holdings verifiable by publicly accessible evidence
  • 📃 responsible: ensure authenticity and integrity of data holdings
  • 👥 user-focused: meet data management norms and expectations of target user communities
  • ⏳️️ sustainable: preserve services and data holdings for the long-term
  • ⚙️ technological: provide infrastructure and capabilities supporting secure, persistent, and reliable services

FAIR1

  • 🔎 findable: use a unique and persistent identifier, add rich metadata (using existing standards2)
  • 🔓 accessible: store in a data repository (⚠️ personal/classified information, but metadata still accessible)
  • ⚙️ interoperable: use an open file format with controlled vocabularies, reference relevant datasets
  • ♻️ reusable: well documented, including a description (README with data sources, background, and how to reproduce the data), a data dictionary (field descriptions, units, titles, missingness), and usage licenses (for code3 or data/presentations/papers4)

Community Data Explorer for Cincinnati (CoDEC)

A data repository composed of equitable, community-level data for Cincinnati.