Privacy and Security
Data are people and when sharing data, privacy is a spectrum of the tradeoffs between risks and benefits to individuals and populations. Data collected at the individual-level by one organization often cannot be shared with another organization due to legal restrictions or organization-specific data governance policies.
We are often interested in community-level (e.g. neighborhood, census tract, ZIP code) data disaggregated by gender, race, or other sensitive attributes. Achieving data harmonization upstream of storage allows for contribution of disaggregated, community-level data without disclosing individual-level data when sharing across organizations.
Data Specifications
Like its namesake, CoDEC encodes data streams about the communities in which we live into a common format using census tract and month so that it can be decoded into different community-level geographies and different time frames. CoDEC relies on the cincy R package to define Cincinnati-area geographies and interpolate area-level data between census tracts, neighborhoods, and ZIP codes in different years.
Specifications are defined
and checked using the as_codec_dpkg()
function in
R.
Equitably Disaggregated Community-Level Data
The White House’s Equitable Data Working Group has defined equitable data as “those that allow for rigorous assessment of the extent to which government programs and policies yield consistently fair, just, and impartial treatment of all individuals.” They advise that equitable data should “illuminate opportunities for targeted actions that will result in demonstrably improved outcomes for underserved communities.” The group recommended to make disaggregated data the norm while being “… intentional about when data are collected and shared, as well as how data are protected so as not to exacerbate the vulnerability of members of underserved communities, many of whom face the heightened risk of harm if their privacy is not protected.”
The U.S. Chief Data Scientist, Denice Ross, has declared that “open data is necessary and not sufficient to drive the type of action that we need to create a more equitable society.” Open data can fall short of driving action if it is not equitable. Disaggregating data by sensitive attributes, like race and ethnicity, can elucidate inequities that would otherwise remain hidden.