Community Actions to Preserve Research Data in Canada

It takes a research community to preserve its data.

Without leadership from a national institution for research data management and preservation (see this Blog’s Introduction), communities that have interests in research data in Canada have become essential in moving forward an agenda to build this needed infrastructure.  At this stage, the strategy for data in Canada has become reliant on community-level actions.  The diversity of domains, sectors, and jurisdictions with stakes in research data complicates efforts to mobilize a grassroots, bottom-up plan for action.  There have, however, been some recent community activities around data that are encouraging.

Libraries and Archives Canada (LAC) tapped into the cultural, heritage, and academic sectors to achieve community engagement in identifying basic principles and goals for a national digital information strategy.  The Canadian Association of Research Libraries (CARL) undertook the drafting of an application to the Canada Foundation for Innovation (CFI) for research data management infrastructure.  As background support for its application, the steering committee for this initiative consulted widely across the scholarly research community, bringing together data interests from diverse research domains.  The Research Data Strategy Working Group (RDSWG), composed of representatives from organizations and agencies concerned about research data in Canada, sought ways to implement recommendations from the deadlocked National Consultation on Access to Scientific Research Data (NCASRD).  The work of this group contributed to the successful 2011 National Data Summit that attracted the participation of over 160 senior officials with interests in research data across sectors.

The Canadian Digital Information Strategy

LAC undertook a community-based consultation beginning in 2005 to develop a national digital information strategy.  Working with over 200 organizations from the public, private, and academic sectors, a National Summit was held in December 2006 bringing together representatives from these stakeholders to identify key components of a digital strategy.  Early in 2007, the Strategic Development Committee (SDC) was struck to synthesize the output of the Summit and to provide substantive input into a draft strategy.  Three sub-groups (Science and Research; Cultural Heritage; and Government Information) were formed to tackle this Committee’s workload.  The contributions of the Science and Research sub-group made the draft version of the digital strategy a valuable statement for and relevant to research data.  The resulting draft digital information strategy was released in the fall of 2007, which launched a public review that was conducted until early 2008.  In March 2010, stakeholders who contributed to the consultation were sent a copy of the final report entitled Canadian Digital Information Strategy: Final Report of Consultations with Stakeholder Communities 2005–2008, bringing closure to the process.

Federal inter-departmental politics intervened between 2008 and 2010, undermining the important community involvement that went into forming this strategy.  Industry Canada laid claims on the digital economy and perceived the national digital information strategy as treading on its turf.  The outcome of this internal political struggle surfaced in the May 10, 2010 announcement of National Consultations on a Digital Economy Strategy, made jointly by the then Minister of Industry (Tony Clement), the Minister of Canadian Heritage and Official Languages (James Moore), and the Minister of Human Resources and Skills Development (Diane Finley).  This consultation was conducted online for only one month, drawing upon only a fraction of the community involvement in the digital information strategy.

While the politics over strategies between digital information and a digital economy undermined the eventual adoption of the Canadian Digital Information Strategy, the engagement of the community in shaping the LAC-developed strategy was very successful and demonstrated common ground among the diverse group of stakeholders that have interests in managing, providing access to, and preserving digital content.

A Proposal for a National Collaborative Research Data Infrastructure

The CARL Directors launched an initiative in June 2010 to prepare a proposal for a national collaborative research data infrastructure.  While the Canada Foundation for Innovation had yet to announce the program envelope for its next CARL Research Data Management Infrastructurefunding round, there was anticipation of a program that might support applications for a national platform.  The vision was to finance the development of a national network of research data services at contributing CARL member institutions, including ingest centres to work with researchers in receiving their data, staging repositories to assist researchers with the management of their data over the life of a project, and data repositories responsible for the long-term preservation of digital research data.  From the beginning of this initiative, the CARL Directors met with many stakeholders, seeking their endorsement for the proposal.  They achieved support from organizations representing other components of Canada’s research infrastructure: Canada’s high-speed optical research network (CANARIE), Canada’s high performance computing grid (Compute Canada), and the Canadian University Council of Chief Information Officers (CUCCIO).  They also held a meeting with researchers from several domains to identify their requirements of a national research data infrastructure.  Out of these discussions with fellow stakeholders, CARL built a network of supporters within the research community.

When CFI announced it funding program, it did not include a national platform competition.  This complicated the logistics of the CARL proposal.  The CFI program that was being run would require each university to make the CARL proposal a high priority among the other proposals on their campus.  In the end, not enough support could be garnered from campuses to compete in this funding round.  The politics of funding envelopes rather than inter-departmental turf prematurely ended this effort at building national research data services.  Nevertheless, the CARL Directors were successful in communicating their ideas and in building community support for this vision.

The Research Data Strategy Working Group’s National Data Summit

The phoenix rising out of the ashes of the National Data Archive consultation and NCASRD was the Research Data Strategy Working Group.  This informal group, without having financial backing, seeks to find ways of advancing the recommendations of the two earlier consultations.  With roots in a number of organizations and agencies for which research data are important, members in the RDSWG strive to keep one another informed about projects and opportunities that will push the research data agenda forward.  When the government signaled its support for open data in 2010, the RDSWG capitalized on this new direction by proposing to host a National Data Summit that would bring together senior officials to discuss the challenges around research data in Canada.  Funding doors opened as the RDSWG promoted the idea of such a Summit and by the spring of 2011 a program was put in place for September 2011.

The outcome of the National Data Summit was the widespread recognition that research data activities need to be coordinated in Canada.  The discussions in the Summit revealed many common issues across research domains and sectors, demonstrating the value of a forum for sharing and debating data issues.  The Summit participants recommended holding a similar event within eighteen months and endorsed formalizing a secretariat to support such a forum.  In the fall of 2012, the RDSWG reorganized itself into Research Data Canada and continues to develop its role as a national forum for data stewardship issues.

Lessons for the Research Data Community

Both the experience of the Canadian Digital Information Strategy caught in the crosshairs of inter-departmental politics and of CARL’s withdrawn CFI application provide important lessons.  Neither the level of community engagement in defining strategic directions nor its endorsement of such a course were exempt from an inter-departmental power grab.  Some political battles are difficult to anticipate; others fall into a consistent pattern.  After all, the Federal Minister who buried the Canadian Digital Information Strategy also dealt the deathblow to the 2011 Census mandatory long form, which would have produced one of Canada’s highly valuable digital information assets.  One lesson from this experience is to avoid turf battles between federal departments, unless Treasury Board is on your side.

Similarly, one cannot assume that innovative ideas, even ones that could accelerate Canada to the forefront of research data infrastructure, will trump local interests.  A lesson from the CARL experience is that funding to develop nationally shared services will face stiff competition from local interests, even though the national services may benefit those locally.  This is one situation where strong community intervention may be able to persuade local interests that national gains outweigh any perceived local loss.

The response to the National Data Summit and Research Data Canada shows that Canada’s research community is willing and eager to engage in activities that may shape strategies and plans around data management and preservation.  This undercurrent of support needs to be nurtured and channeled to achieve a national collaborative research data infrastructure.

The next essay looks at the strategic shift from a national institution to national infrastructure for research data in Canada.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]

Canada’s Long Tale of Data

The challenges around preserving research data in Canada have reached a point where we can no longer wait for a solution to be handed down from on high. If we are to save data produced from today’s research, we are going to have to work together with “memory” institutions in Canada willing to incorporate research data into their mandates for preservation.  The essays in this Blog address different issues around preserving research data in Canada.

The “Long Tale of Data” subtitle for this Blog is a play on the word, tail. Long Tail of Data
“The Long Tail of Data” describes the distribution of the number of datasets by their file storage size.  The curve for this distribution shows relatively fewer very large datasets compared to the ten of thousands of smaller sized datasets.  Currently, “big data”, i.e., the very large datasets, are receiving a lot of attention, while the myriad of smaller datasets pose their own daunting challenges for management and preservation.  This story about the data tail is only one of many tales about research data today.

Early Attempts

The focus here is to share several data tales, beginning with the story behind earlier attempts to establish a national data archive in Canada.  These efforts span four decades.  During the 1960s and early 1970s, several countries, including the United States, United Kingdom, Australia, and Germany, established social science data archives (these same nations are often viewed as Canada’s research peers, which is discussed further in two following Blog entries: RDMI 3 and Data Peers).  The social sciences became the domain of early national developments in data access and preservation.

In the late 1970s, a Canadian initiative established a catalogue for social science research data.  Known as the Data Clearing House for the Social Sciences, this organization did not hold any data files but did produce at least one printed catalogue of social science research data before shutting down its operations rather hastily.  After the demise of the short-lived Data Clearing House, it was difficult to find a Canadian funding agency willing to take a risk in this type of national research data infrastructure.

In 1973, the Public Archives of Canada established the Machine Readable Archives Division that provided research data preservation services for federal government departments and agencies.  Unfortunately, a reorganization within the Archives in 1987 resulted in this division being disbanded and its staff being dispersed among the remaining divisions within the Archives.  As a consequence, no coordinated effort was made to replace the services provided by the Machine Readable Archives and the gap between what had been collected and what failed to be collected grew rapidly.

The demise of the Machine Research Archives Division hurt Canada in several significant ways.  No longer was there a national proponent for data preservation in Canada.  Stakeholders were without a formal body with whom they could express their concerns about research data, i.e., no national forum for data existed.  No formal structure existed to develop standards or best practices in data management and preservation.  Without a formal, national body for data, Canada was without a unified voice in the international data arena.  All of these factors have contributed to stunting the stewardship of research data in Canada.

There have been a few stopgap efforts to archive research data in the social sciences in the absence of a national institution.  Most notably, the Data Library at the University of British Columbia under the leadership of Laine Ruus began collecting data in the 1970s.  A dozen university data libraries across Canada agreed to receive data from SSHRC-funded projects beginning in 1989, but only until a national preservation service could be established.  This group became know as Appendix J, which was the appendix in the SSHRC application guide where they were listed.  Without strong incentives to submit datasets to a member of the Appendix J group, very few researchers deposited any data.

A Body of Evidence

Several studies have documented the need for a national data archive or a national institution providing data management and preservation services.  One of the earliest cases appears in the report, Survey research: report of the Consultative Group on Survey Research in 1976.  While the report emphasizes access to survey data without directly tackling preservation, the authors do recommend that “the initial preparation of the data should be done not only for immediate use but also in view of ultimate storage in a data bank [emphasis added, archaic] (p. 1.21).”  Providing long-term access without specifically naming preservation is common even today among elements of the research community.

In 1996, the Data and Information Systems Panel of the Canadian Global Change Program released a report that now serves as a benchmark against which progress in research data management in Canada can be assessed.  Data Policy and Barriers to Data Access in Canada: Issues for Global Change Research contains ten recommendations under five categories: Infrastructure, Archiving, Documentation, Access, and Standards.  Regarding the preservation of research data, this report states: “There is a lack of focus for archival standards and processes in Canada (p. 51).”  This absence of focus has been a significant obstacle in getting the appropriate attention of senior officials in Canada to address research data management and preservation.  The 2011 National Data Summit (see below) was an important, recent step in gaining the focus of a group of senior administrators.

The Canadian Association for Public Data Use (CAPDU) called for a national data archive in a submission to John English’s review of the National Library and Archives of Canada in 1998.  The final report, The Role of the National Archives of Canada and the National Library of Canada, included a recommendation calling for action to preserve research data.  An outcome to the English report was the striking of the National Data Archive Consultation (NDAC) in 2001 and 2002, which the National Archives of Canada and the Social Sciences and Humanities Research Council jointly sponsored.  This consultation produced two reports.  The first volume, Phase One: Needs Assessment Report, documented the case for national data archive services, while the second volume, Building Infrastructure for Access to and Preservation of Research Data, described various models for such services.  The momentum from these two publications was lost when the search to find a senior official to champion the consultation’s findings within Government failed to happen within a year and a half of the final report’s release.

In 2004, the National Consultation on Access to Scientific Research Data (NCASRD) was launched to address the issues of data access in the physical and life sciences.  This consultation was directed to build and expand upon the work completed two years earlier in the humanities and social sciences, .  The growing interest in e-Science and the OECD Principles and Guidelines for Access to Research Data from Public Funding were instrumental in the timing of this consultation.  The Final Report of the National Consultation on Access to Scientific Research Data was released in June 2005 and called for the establishment of a national steering body, Data Canada, to help coordinate data management and preservation services.  Again a champion was sought to advance this study’s findings but no one was found within a reasonable period of time, leaving this study’s agenda sidelined like the previous efforts.

In 2008, a working group, under the guidance of Pam Bjornson, Executive Director of CISTI, began to explore ways of implementing some of the recommendations in the NCASRD final report in the absence of a national research data steering body.  Known as the Research Data Strategy Working Group (RDSWG), they conducted a study in 2008 assessing the gaps in data stewardship in Canada.  This analysis provided an update to the NDAC needs assessment from earlier in the decade.   In 2011, the gap analysis was brought up to date and incorporated into the backgrounder information disseminated  in advance of the September 2011 National Data Summit organized by the RDSWG.   Approximately 160 senior managers concerned about the management of research data in Canada attended this event.  The Summit’s final report, Mapping the Data Landscape: Report of the 2011 Canadian Research Data Summit, included a set of recommendations to develop stronger community involvement in research data management and preservation (this is discussed further in the next Blog entry.)

Moving Forward

By 2006, the possibility of a new national institution established specifically for research data management and preservation was clearly not in the cards for Canada.  The failure to find a senior official to champion the recommendations from either the NDAC or NCASRD studies was a clear indicator that this was not going to happen.  The requirement for such an institution had been demonstrated multiple times; however, the political will essential to make it happen could not be mobilized.  At the same time that this quest was dead-ending, new developments in e-Science and Cyberinfrastructure were taking shape internationally that opened a new strategy for research data in Canada.  This is discussed in more detail in the Blog entry, From National Institution to National Infrastructure.

Was the pursuit of a national institution dedicated to research data in Canada a foolhardy idea?   Comparing Canada to its usual peers, one finds institutions dedicated to social science research data that are now approaching sixty years of operation.  Given this, the quest for a Canadian national data archive made a great deal of sense.  Canada simply seemed to be lagging behind their peers and was in need of quickly catching up.  More recent evidence, however, suggests that many of us in Canada were mistaken about which countries we should consider as our peers in research data infrastructure, especially in light of the absence of a political will in Canada to establish a new institution for this purpose.  This topic is discussed in the another Blog entry.

There are several other reasons why a national institution was far from being foolish.  A national data archive would help Canada address several important issues that require a national focus.  There is the need to identify clear mandates that define the data stewardship roles of various organizations.  These mandates span federal and provincial jurisdictions as well as the public and private sectors.   The complexity of multiple mandates in such an environment would be best handled through a national forum.  Legislation directed at the general management and use of sensitive or confidential data often conflicts with valid research uses of such data.  A national data archive could facilitate the resolution of data issues around sensitive or confidential data.  Canada is in need of national leadership to build standards and best practices for the management and preservation of research data.  Finally without a formal institutional voice for data, Canada is disadvantaged internationally.  Representation in international data agreements and initiatives is critical for Canada’s researchers to stay competitive.  Even without a national institution for research data management and preservation, the need still exists to coordinate and manage these national data issues.

The next essay looks at a growing community of support around research data management in Canada.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]