Column Names as Contracts

Name: Column Names as Contracts
Start: 2021-02-25T11:00:00Z
Location: University of Toronto - Department of Statistical Sciences

Project Follow

A visualization of data fields named with a controlled vocabulary

Date

Feb 25, 2021 11:00 AM

Event

Toronto Data Workshop on Reproducibility

Location

University of Toronto - Department of Statistical Sciences

Virtual - Hosted by Toronto, Ontario

Complex software systems make performance guarantees through documentation and unit tests, and they communicate these to users with conscientious interface design. However, published data tables exist in a gray area; they are static enough not to be considered a “service” or “software”, yet too raw to earn attentive user interface design. This ambiguity creates a disconnect between data producers and consumers and poses a risk for analytical correctness and reproducibility.

In this talk, I will explain how controlled vocabularies can be used to form contracts between data producers and data consumers. Explicitly embedding meaning in each component of variable names is a low-tech and low-friction approach which builds a shared understanding of how each field in the dataset is intended to work.

Doing so can offload the burden of data producers by facilitating automated data validation and metadata management. At the same time, data consumers benefit by a reduction in the cognitive load to remember names, a deeper understanding of variable encoding, and opportunities to more efficiently analyze the resulting dataset.

After discussing the theory of controlled vocabulary column-naming and related workflows, I will illustrate these ideas with a demonstration of the convo R package, which aids in the creation, upkeep, and application of controlled vocabularies.

This talk is based on by related blog post and R package.

Column Names as Contracts

Emily Riederer

Senior Analytics Manager