Data: a rose by any other name (part 2)

In an earlier blog entry, I spoke about the importance of having a technical language that allows data curators to talk within their profession about the details of their work. The words they use may be part of society’s everyday vocabulary but carry a meaning specific to data curation. Confusion can arise during conversations between data curators and others outside the profession when a term is used that carries different meanings for each group. For example, I was in a meeting recently with people from a variety of technical backgrounds, including librarians and research administrators. One librarian spoke about sharing resources across libraries. For the librarian, resources meant information tools, such as, library guides, while the administrator assumed that resources referred to money. The administrator was confused about why libraries would be exchanging money.

Communication problems can also arise within a campus’ research community. We encountered this with humanities researchers on our campus earlier in the year when our library hosted a week of workshops and talks on research data management. Speakers at this event consisted of researchers from all areas on the campus, including two prominent researchers from the digital humanities. One of the humanists said in reference to the title of the event, Research Data Management Week, that researchers in the humanities don’t see their research involving data. Rather, they see data as something belonging to the sciences. When the other humanist spoke, she commented on management in the event’s title, saying that in the humanities, management is seen as a topic for discussion in the business school. Of the four words in the event’s title, only research and week were acceptable concepts in the eyes of these humanists.

Subsequent to this event, a few of us in research data management services met with a humanities researcher who has a unique collection of digital video recordings of live musical performances from a Middle East country. His immediate concern was about the survival of the digital content. In addition to his copy of these recordings, only one other person on the globe has a set. As we worked through the options for making secure copies of his research content, I realized that we were talking primarily about organising his research materials, which happen to be in digital format.

Those of us providing research data management services learned an important lesson from these encounters. When talking with researchers from the humanities, we need to talk about organising their digital research materials rather than managing their data. A meeting with the liaison librarians in the humanities library later confirmed this approach. As data curators, we will continue to talk about managing data with most of the researchers on our campus, but with humanists, we have a new way of talking with them that lowers communication barriers when discussing their digital research content.

Data: a rose by any other name

Specializations in research data management are quickly multiplying. Alma Swan identified data creators or authors, data scientists, data managers, and data librarians in her 2008 report, The Skills, Roles and Career Structure of Data Scientists and Curators: an assessment of current practice and future needs. New experts in data tools, data infrastructures, data sciences, and data management were identified one year later in the report of the U.S. Interagency Working Group on Digital Data, Harnessing the Power of Digital Data for Science and Society.  This fast growth in new professional positions has taken place concurrently with an equally rapid developing technical language to help data experts communicate among themselves and the wider communities they serve. Unfortunately, as this professional vocabulary continues to evolve quickly, it can often confuse the very communities that data professionals are seeking to help.

An example of the changing nature of this technical language is the shift from archiving to preserving.  Those of us who today are specialists in research data management talk about preserving research data. Two decades ago, we spoke about data archiving. Moving from archiving to preserving came about in the late 1990s when digital preservation established itself as a field encompassing digital content. This change of terms carried an explicit identification of objects that are digital rather than analogue. Preservation became associated with digital content and archiving was largely left with analogue material.

Preserving or archiving. Does this distinction warrant a change in terminology? It can. Our use of common language for technical purposes can cause confusion. Sometimes it is more appropriate to adopt a lesser used term to introduce a new technical usage. For example, my initial reaction to the use of metadata in the early days of the Web was that we didn’t need a replacement term for cataloguing information. Subsequently, however, I saw the value of this new term. Metadata covers a greater range of descriptive information than a catalogue record and in the digital context, metadata can be actionable, driving automated processes. Metadata as a concept added new meaning and functionality. This term has even become a household word with Edward Snowden’s revelations about the U.S. National Security Agency’s use of metadata to identify telecommunications for snooping.

Think about how vacuous the term data has become. Everything that is digital is now called data. We have data plans with our telecom providers and WIFI digital cameras that store data in the cloud. There are digital collections of texts, images, sound, and video in our libraries. All of this content is also called data. How do we distinguish research data from everything else that is digital?  I prefer to think of research data as information structured by methodology and organized in digital products that are used as evidence in the research process.  This leaves all other digital content that is not research data with the potential of becoming research data.  From this perspective, research data are a special class of digital data, allowing us to talk about the technical activities of research data management without confusing it with everything else that is digital.

Part of being a professional in research data management is ensuring that the concepts we use in a technical context are consistent. This brings me to a story that took place recently between a senior scientist with a federal government department and members of the Canadian Polar Data Network, who were providing advice on research data management infrastructure. The scientist’s training was in biology and whenever we spoke about data preservation, he would smile. Finally, he said, “When you say preserving data, I think of my mother’s preserves … canning data in jars. Wouldn’t it be more appropriate to talk about conserving data?” The question surprised us because we did not have a context for research data conservation. For biologists, the act of conserving is one of protecting something or restoring it to an earlier state. While data preservation does involve processes to protect digital content as well as the contextual and technical information describing this content, the intent is to maintain the materials in their original digital state indefinitely. Activities involved in repairing research data or its supporting metadata may be part of the curation of the data prior to processing the content for preservation. However, the act of preserving research data is one of keeping the digital content in its pristine state.

This example illustrates the communication challenge that can occur across domains when concepts differ. While biologists may more readily use conservation than preservation, we need to stay within the context of our data profession. In describing digital preservation practices within research data management, we need to convey a technical meaning that applies to activities supporting the long-term access to research data. Today, this happens to be data preservation.