Data: a rose by any other name

Specializations in research data management are quickly multiplying. Alma Swan identified data creators or authors, data scientists, data managers, and data librarians in her 2008 report, The Skills, Roles and Career Structure of Data Scientists and Curators: an assessment of current practice and future needs. New experts in data tools, data infrastructures, data sciences, and data management were identified one year later in the report of the U.S. Interagency Working Group on Digital Data, Harnessing the Power of Digital Data for Science and Society.  This fast growth in new professional positions has taken place concurrently with an equally rapid developing technical language to help data experts communicate among themselves and the wider communities they serve. Unfortunately, as this professional vocabulary continues to evolve quickly, it can often confuse the very communities that data professionals are seeking to help.

An example of the changing nature of this technical language is the shift from archiving to preserving.  Those of us who today are specialists in research data management talk about preserving research data. Two decades ago, we spoke about data archiving. Moving from archiving to preserving came about in the late 1990s when digital preservation established itself as a field encompassing digital content. This change of terms carried an explicit identification of objects that are digital rather than analogue. Preservation became associated with digital content and archiving was largely left with analogue material.

Preserving or archiving. Does this distinction warrant a change in terminology? It can. Our use of common language for technical purposes can cause confusion. Sometimes it is more appropriate to adopt a lesser used term to introduce a new technical usage. For example, my initial reaction to the use of metadata in the early days of the Web was that we didn’t need a replacement term for cataloguing information. Subsequently, however, I saw the value of this new term. Metadata covers a greater range of descriptive information than a catalogue record and in the digital context, metadata can be actionable, driving automated processes. Metadata as a concept added new meaning and functionality. This term has even become a household word with Edward Snowden’s revelations about the U.S. National Security Agency’s use of metadata to identify telecommunications for snooping.

Think about how vacuous the term data has become. Everything that is digital is now called data. We have data plans with our telecom providers and WIFI digital cameras that store data in the cloud. There are digital collections of texts, images, sound, and video in our libraries. All of this content is also called data. How do we distinguish research data from everything else that is digital?  I prefer to think of research data as information structured by methodology and organized in digital products that are used as evidence in the research process.  This leaves all other digital content that is not research data with the potential of becoming research data.  From this perspective, research data are a special class of digital data, allowing us to talk about the technical activities of research data management without confusing it with everything else that is digital.

Part of being a professional in research data management is ensuring that the concepts we use in a technical context are consistent. This brings me to a story that took place recently between a senior scientist with a federal government department and members of the Canadian Polar Data Network, who were providing advice on research data management infrastructure. The scientist’s training was in biology and whenever we spoke about data preservation, he would smile. Finally, he said, “When you say preserving data, I think of my mother’s preserves … canning data in jars. Wouldn’t it be more appropriate to talk about conserving data?” The question surprised us because we did not have a context for research data conservation. For biologists, the act of conserving is one of protecting something or restoring it to an earlier state. While data preservation does involve processes to protect digital content as well as the contextual and technical information describing this content, the intent is to maintain the materials in their original digital state indefinitely. Activities involved in repairing research data or its supporting metadata may be part of the curation of the data prior to processing the content for preservation. However, the act of preserving research data is one of keeping the digital content in its pristine state.

This example illustrates the communication challenge that can occur across domains when concepts differ. While biologists may more readily use conservation than preservation, we need to stay within the context of our data profession. In describing digital preservation practices within research data management, we need to convey a technical meaning that applies to activities supporting the long-term access to research data. Today, this happens to be data preservation.