The Data Repository Community Needs To Take Certification Seriously

An editorial in the 2015/01/02 issue of Science notes that they will be identifying data repositories to promote among its authors and readers this year. The selection criteria mentioned include repositories that “are well managed, have long-term support, and are responsive to community needs.

Up to this point, many librarians have been concerned about publishers establishing their own data repositories and then charging for access to these collections of research data. This recent Science editorial presents a new potential concern regarding publisher engagement in research data management. Might a publisher’s endorsement of a data repository be construed as certification for that repository? Will publishers end up setting a de facto bar for trusted status of data repositories? What will be the implications for domain or local data repositories outside Science’s scope? These are topics needing the attention of the data repository community now.

I see the value of journals being able to recommend data repositories to authors who might be unaware of the choices available to them. However, in the absence of a widely supported and independent certification process, the data repository community runs the risk of journals conducting assessments using their own yardsticks. Without a standard set of criteria, comparisons of data repositories across journal ratings become problematic. Not only are common measures necessary, but a sense of fair assessment conducted by entities at arms-length is desirable. For example, an assessment conducted by a publisher of its own data repository has less face value than one performed by an independent party.

Rather than see Science strike out on its own to assess data repositories, I would prefer to have them work collaboratively with organisations already engaged in these activities. The Standards and Interoperability Committee of Research Data Canada has a report soon to be released that presents a set of criteria used to assess a number of data repositories. The Research Data Alliance has a working group on the audit and certification of data repositories developed on a partnership between the World Data System and the Data Seal of Approval. In Germany, a catalogue of criteria for trusted digital repositories (nestor) has been developed through community involvement. Journal editors and publishers should work with these organisations when preparing a list of data repositories to recommend.

The Value of Data Management Plans

A big news item coming out of the Digital Infrastructure Summit held in Ottawa on January 28-29, 2014 was the announcement that Canada’s federal research councils will introduce policy changes over the next 24 months that will require applicants to include data management plans in their funding proposals. This announcement came quickly on the heels of a Fall 2013 consultation conducted by these same councils on Capitalizing on Big Data. Within the background material prepared for this study, these councils were challenged to adopt “agency-based and focused data stewardship plans (p. 8)” of which data management plans (DMPs) were seen as integral.  The push toward this policy change will now likely face some opposition, although momentum currently seems to be with those promoting policies in support of a Canadian data stewardship culture.

Some research councils in other countries have already implemented DMPs. For example, a guideline among the data principles of the Research Councils of the United Kingdom (RCUK) specifically encourages its members to develop data management plans:

Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice. Data with acknowledged long-term value should be preserved and remain accessible and usable for future research.

Provided as an umbrella framework, each of the seven research councils of RCUK is independently responsible for its data policies.  For example, the Economic and Social Research Council (ESRC) describes its reasons for requiring data management plans as:

We believe that a structured approach to data management results in better quality data that is ready to deposit for further sharing.

This single sentence is very revealing about the expected returns on DMPs.  To begin, a DMP is seen to contribute structure to the handling of data within a project.  An outcome of this approach is believed to be higher quality data.  Furthermore, the data will be better prepared for deposit with an organization that will make the data available for others.

On the surface, data management plans appear to be a very straightforward policy tool. They simply lengthen current funding applications by another page or two. However, the purposes they fulfill and the processes they embody will enrich the production and custodial care of research data.  The ESRC anticipation of higher quality data for sharing also implies collaboration with data curation services and with data repositories.  Ultimately, a DMP should engage researchers in conversations with those providing such services.  In this context, a DMP becomes a document of relationships that should be shared, edited, and monitored among those contributing to a project.  From this viewpoint, a DMP functions as a dynamic document of agreements.

To serve the multiple purposes just described, DMPs should be designed for easy digital exchange across a variety of applications.  The best way to approach this in today’s complex world of  information technology is through a metadata standard describing a data model of elements constituting a DMP.   CASRAI, a community-based standards body for research administrative information, is well positioned to do this.  In fact, the U.K. chapter of CASRAI has already begun work on a set of elements for a DMP data model.  In conjunction with this, it would be helpful if the Standards and Interoperability Committee of Research Data Canada would develop a fundamental flowchart representing the interplay of purposes, uses, and relationships expressed in a DMP.  This would be both informative for the CASRAI working group developing specifications for DMPs as well as helpful in validating the completeness of a DMP data model.

Community Actions to Preserve Research Data in Canada

It takes a research community to preserve its data.

Without leadership from a national institution for research data management and preservation (see this Blog’s Introduction), communities that have interests in research data in Canada have become essential in moving forward an agenda to build this needed infrastructure.  At this stage, the strategy for data in Canada has become reliant on community-level actions.  The diversity of domains, sectors, and jurisdictions with stakes in research data complicates efforts to mobilize a grassroots, bottom-up plan for action.  There have, however, been some recent community activities around data that are encouraging.

Libraries and Archives Canada (LAC) tapped into the cultural, heritage, and academic sectors to achieve community engagement in identifying basic principles and goals for a national digital information strategy.  The Canadian Association of Research Libraries (CARL) undertook the drafting of an application to the Canada Foundation for Innovation (CFI) for research data management infrastructure.  As background support for its application, the steering committee for this initiative consulted widely across the scholarly research community, bringing together data interests from diverse research domains.  The Research Data Strategy Working Group (RDSWG), composed of representatives from organizations and agencies concerned about research data in Canada, sought ways to implement recommendations from the deadlocked National Consultation on Access to Scientific Research Data (NCASRD).  The work of this group contributed to the successful 2011 National Data Summit that attracted the participation of over 160 senior officials with interests in research data across sectors.

The Canadian Digital Information Strategy

LAC undertook a community-based consultation beginning in 2005 to develop a national digital information strategy.  Working with over 200 organizations from the public, private, and academic sectors, a National Summit was held in December 2006 bringing together representatives from these stakeholders to identify key components of a digital strategy.  Early in 2007, the Strategic Development Committee (SDC) was struck to synthesize the output of the Summit and to provide substantive input into a draft strategy.  Three sub-groups (Science and Research; Cultural Heritage; and Government Information) were formed to tackle this Committee’s workload.  The contributions of the Science and Research sub-group made the draft version of the digital strategy a valuable statement for and relevant to research data.  The resulting draft digital information strategy was released in the fall of 2007, which launched a public review that was conducted until early 2008.  In March 2010, stakeholders who contributed to the consultation were sent a copy of the final report entitled Canadian Digital Information Strategy: Final Report of Consultations with Stakeholder Communities 2005–2008, bringing closure to the process.

Federal inter-departmental politics intervened between 2008 and 2010, undermining the important community involvement that went into forming this strategy.  Industry Canada laid claims on the digital economy and perceived the national digital information strategy as treading on its turf.  The outcome of this internal political struggle surfaced in the May 10, 2010 announcement of National Consultations on a Digital Economy Strategy, made jointly by the then Minister of Industry (Tony Clement), the Minister of Canadian Heritage and Official Languages (James Moore), and the Minister of Human Resources and Skills Development (Diane Finley).  This consultation was conducted online for only one month, drawing upon only a fraction of the community involvement in the digital information strategy.

While the politics over strategies between digital information and a digital economy undermined the eventual adoption of the Canadian Digital Information Strategy, the engagement of the community in shaping the LAC-developed strategy was very successful and demonstrated common ground among the diverse group of stakeholders that have interests in managing, providing access to, and preserving digital content.

A Proposal for a National Collaborative Research Data Infrastructure

The CARL Directors launched an initiative in June 2010 to prepare a proposal for a national collaborative research data infrastructure.  While the Canada Foundation for Innovation had yet to announce the program envelope for its next CARL Research Data Management Infrastructurefunding round, there was anticipation of a program that might support applications for a national platform.  The vision was to finance the development of a national network of research data services at contributing CARL member institutions, including ingest centres to work with researchers in receiving their data, staging repositories to assist researchers with the management of their data over the life of a project, and data repositories responsible for the long-term preservation of digital research data.  From the beginning of this initiative, the CARL Directors met with many stakeholders, seeking their endorsement for the proposal.  They achieved support from organizations representing other components of Canada’s research infrastructure: Canada’s high-speed optical research network (CANARIE), Canada’s high performance computing grid (Compute Canada), and the Canadian University Council of Chief Information Officers (CUCCIO).  They also held a meeting with researchers from several domains to identify their requirements of a national research data infrastructure.  Out of these discussions with fellow stakeholders, CARL built a network of supporters within the research community.

When CFI announced it funding program, it did not include a national platform competition.  This complicated the logistics of the CARL proposal.  The CFI program that was being run would require each university to make the CARL proposal a high priority among the other proposals on their campus.  In the end, not enough support could be garnered from campuses to compete in this funding round.  The politics of funding envelopes rather than inter-departmental turf prematurely ended this effort at building national research data services.  Nevertheless, the CARL Directors were successful in communicating their ideas and in building community support for this vision.

The Research Data Strategy Working Group’s National Data Summit

The phoenix rising out of the ashes of the National Data Archive consultation and NCASRD was the Research Data Strategy Working Group.  This informal group, without having financial backing, seeks to find ways of advancing the recommendations of the two earlier consultations.  With roots in a number of organizations and agencies for which research data are important, members in the RDSWG strive to keep one another informed about projects and opportunities that will push the research data agenda forward.  When the government signaled its support for open data in 2010, the RDSWG capitalized on this new direction by proposing to host a National Data Summit that would bring together senior officials to discuss the challenges around research data in Canada.  Funding doors opened as the RDSWG promoted the idea of such a Summit and by the spring of 2011 a program was put in place for September 2011.

The outcome of the National Data Summit was the widespread recognition that research data activities need to be coordinated in Canada.  The discussions in the Summit revealed many common issues across research domains and sectors, demonstrating the value of a forum for sharing and debating data issues.  The Summit participants recommended holding a similar event within eighteen months and endorsed formalizing a secretariat to support such a forum.  In the fall of 2012, the RDSWG reorganized itself into Research Data Canada and continues to develop its role as a national forum for data stewardship issues.

Lessons for the Research Data Community

Both the experience of the Canadian Digital Information Strategy caught in the crosshairs of inter-departmental politics and of CARL’s withdrawn CFI application provide important lessons.  Neither the level of community engagement in defining strategic directions nor its endorsement of such a course were exempt from an inter-departmental power grab.  Some political battles are difficult to anticipate; others fall into a consistent pattern.  After all, the Federal Minister who buried the Canadian Digital Information Strategy also dealt the deathblow to the 2011 Census mandatory long form, which would have produced one of Canada’s highly valuable digital information assets.  One lesson from this experience is to avoid turf battles between federal departments, unless Treasury Board is on your side.

Similarly, one cannot assume that innovative ideas, even ones that could accelerate Canada to the forefront of research data infrastructure, will trump local interests.  A lesson from the CARL experience is that funding to develop nationally shared services will face stiff competition from local interests, even though the national services may benefit those locally.  This is one situation where strong community intervention may be able to persuade local interests that national gains outweigh any perceived local loss.

The response to the National Data Summit and Research Data Canada shows that Canada’s research community is willing and eager to engage in activities that may shape strategies and plans around data management and preservation.  This undercurrent of support needs to be nurtured and channeled to achieve a national collaborative research data infrastructure.

The next essay looks at the strategic shift from a national institution to national infrastructure for research data in Canada.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]