Who Are Canada’s Research Data Peers?

In October 2007, Kevin Schurer, who was then the Director of the U.K. Data Archive, made a presentation at the ICPSR Official Representatives Meeting at the University of Michigan about establishing a data world wide web.  Kevin's World Data ViewHe used this graphic to illustrate the current status of social science data curation around the globe.  Each country has been crudely scaled according to the level of its social science data services.  He noted that the U.S. and U.K. are disproportionally larger in this projection than their actual physical size because of the large volume of social science data curated in these two countries.  He went on to say that Canada in his map is much smaller than its size because, “Canada can’t get its act together [regarding research data].”  While this was a rather dismaying statement to have proclaimed about my home country in an international meeting, grounds exist for him coming to such a conclusion (see the introductory Blog item for evidence.)  This observation about Canada raises two important questions:

  1. Who are Canada’s international peers in research data?
  2. How far behind is Canada in research data management infrastructure?

Canada’s International Data Peers

The Introduction to this Blog touched upon this topic.  Canadians typically view their international research peers as the United States, United Kingdom, Australia, and Germany.  In many fields of research and in some areas of research infrastructure, this is the case.  For example, CANARIE is a world-class research network that is comparable with Europe’s research network, GÉANT.  Contributing to the validity of this comparison is the level of top-down impetus both receive through government policy, programs, and funding for these networks.

Research Data Management Infrastructure (RDMI) in Canada, however, does not compare with the developments in data infrastructure in these four countries.  As mentioned previously, bottom-up actions by higher education institutions willing to collaborate with one another around cost-sharing initiatives are the driving force for RDMI in Canada, which by comparison is a very different environment.

Who then are Canada’s data peers?  Looking at Shurer’s 2007 map, Canada appears to be grouped with the rest of the world outside the United States, Europe, and Australia.  I had an opportunity to observe firsthand a few of Canada’s peers at a European Commission sponsored workshop on “Global Research Data Infrastructures: The Big Data Challenges,” held in Brussels in October 2011.  The objective of this workshop was to further the development of a 2020 roadmap for global research data infrastructure.  There were representatives from Africa, Asia, Australia, Canada, Europe, South America, and the United States, each asked to speak about data infrastructure in their country.  I was asked to talk about data infrastructure in Canada.

The presenters from Brazil and Taiwan spoke about having to build data infrastructure from the bottom-up without the top-down guidance or incentives common in the U.S., Europe, or Australia.  I was struck by how similar data infrastructure development in Brazil and Taiwan is to Canada.  Who are Canada’s data peers?  Nations building their RDMI from the bottom-up.

How Far Behind Is Canada From the Frontrunners on the Planet?

Internationally, RDMI consists of a real patchwork of activities regardless of whether the development is top-down or bottom-up.  Looking at the various parts of the patchwork can provide different perspectives about where a country is positioned globally.  This patchwork has been characterized as a Digital Science Ecosystem in the Global Research Data Infrastructure 2020 Roadmap (GRDI2020).  Thinking of research data infrastructure as an ecosystem focuses attention on the complex relationships among important components of scientific research.  To understand these complex relationships in an environment of data-intensive, multidisciplinary research is as challenging as it is to comprehend the interdependency among species in a biological ecosystem.  The authors feel that the broader research environment is as much of a contributor to advances and transformations in scientific fields as technological progress (see p. 17).

Digital Science EcosystemThe GRDI2020 report describes the Digital Science Ecosystem as being composed of Digital Data Libraries, Digital Data Archives, Digital Research Libraries, and Communities of Research. The relationships among these four components make up the patchwork environment in which this report envisions future scientific research to be conducted.  From both a technical and organizational standpoint, relationships in a digital ecosystem are established and maintained through interoperability mechanisms among these four components.  An earlier entry to this Blog highlighted the importance of institutions in preserving research data.  Three of the GRDI2020 components are based on institutions: digital data libraries, digital data archives, and digital research libraries.  The earlier Blog entry argued that these institutions do not have to be national, central services but can be distributed across existing institutions with a mandate to preserve research data.  The success of such a distributed inter-institutional preservation network will depend on its interoperability across the network and with the wider research environment.

This digital science ecosystem model can be used to assess the current state of research data infrastructure in a country.  Putting aside the various challenges of top-down or bottom-up development, what aspects of the four components of the GRDI2020 ecosystem does a country have?  Furthermore, what interoperability relationships have been established among these components?  Looking specifically at Canada, a strong network of data libraries exist on campuses across the country because of the Data Liberation Initiative (DLI).  Since 1996, academic libraries have provided data services to support the dissemination of standard data products from Statistics Canada.  In addition to providing access to data, DLI also conducts annual training regionally in Canada, constantly upgrading the skills of those who provide data services on their local campus.  Compared to Europe, Canada is much farther along in developing a network of data libraries that support local access to data.  Canada also has a strong network of research libraries with large and growing digital collections, including repository services for research results.  The Achilles heel for Canada is digital data archives.  This is the ecosystem component for which Canada lags far behind the U.S., U.K., Australia, and Germany, although a few research libraries are beginning developments in this area that hopefully will begin to close the gap.  The Canadian Polar Data Network is an example of a new Canadian collaborative, inter-institutional, cross-sectoral, distributed data archive that serves as a model for other Canadian institutions to emulate.

With strategic top-down investment in data preservation services, Canada could have leapfrogged to be among the frontrunners in the digital science ecosystem.  In the absence of top-down development, research libraries working collaboratively with research communities must build from the bottom-up to establish data preservation services.  The engagement of senior administrators at Canadian universities in the development of research data infrastructure is critical to a bottom-up strategy.  There is a need for university policies that establish an institutional mandate to preserve research records and that identify institutional data stewardship responsibilities covering the research lifecycle.

Finally, taking on these tasks at the institutional level will help begin the conversation between universities and national funding agencies around the bigger question of who should be doing what regarding data.  Currently, both parties are at loggerheads on this topic.

[The views expressed in this Blog are my own and do not necessarily represent those of my institution.]

Research Data Management Infrastructure II

In the previous entry, Research Data Management Infrastructure (RDMI) was defined as the mix of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.  The context for RDMI within the research lifecycle was described and the importance of institutional-level engagement in data stewardship was emphasized.  Finally, the position was taken that cross-institutional collaboration would enable building collectively the national RDMI that has eluded Canada without top-down design or resources.  How does this context compare with the two other pillars of Canada’s research infrastructure?

Research Infrastructure: The Three Pillars

The Canadian University Council of Chief Information Officers (CUCCIO) hosted the Digital Infrastructure Summit in June 2012 in Saskatoon to address the unclear future of research infrastructure in Canada today.  Concerns have been expressed about the lack of a vision for research infrastructure in Canada and the need for more coordinated planning.  For example, the current business models for CANARIE, the coordinating agency for Canada’s high-speed optical research network, and for Compute Canada, the organization for high performance computing, operate on funding cycles that are less than optimal and on brinksmanship review processes that seem to threaten the very existence of this critical infrastructure.  Borrowing from the National Data Summit format, the CUCCIO Summit invited around sixty leaders in research infrastructure to discuss how best to approach these concerns.  Coming out of this forum was the establishment of a Leadership Council with a mission to articulate a vision for research infrastructure and to organize a follow-up summit.

Canada's Research Infrastructure PillarsWhile Canada does not have a formally recognized national organization for RDMI (Research Data Canada and CARL are working to fill part of this void), CUCCIO recognizes data infrastructure as one of three pillars constituting Canada’s research infrastructure, along with a high speed research network and high performance computing. There are some important differences between the formal support for these latter two infrastructure pillars and RDMI.  First, different forces drive these three infrastructure pillars.

  1. CANARIE provides top-down coordination and incentives, working with a group of Optical Regional Advanced Networks (ORANs) across the country.   The ORANs keep the operational delivery of the high speed network close to the researchers in their areas, while CANARIE works to weave the regional communication networks into a national research service.
  2. High Performance Computing (HPC) in Canada has a similar organizational structure of regional services (WestGrid, Compute Ontario, Calcul Quebec, Compute Atlantic) with national governance provided through Compute Canada, although the regional services tend to operate with a tradition of independence.  Nevertheless, HPC has received top-down incentives, including financial support through the Canada Foundation for Innovation.
  3. As already stated, RDMI does not have a formal national organization to represent its interests, although there are national coordinating roles for both Research Data Canada and CARL to play in data curation and infrastructure within their communities.  Unfortunately, no regional organizations for data infrastructure exist.

While RDMI has been embraced as an equal infrastructure partner by leaders in CANARIE and Compute Canada, the playing field is clearly unequal at this stage.  The good news is that Research Data Canada and CARL continue to be invited to participate in events organized by the other two infrastructure partners.

Second, the voice for RDMI is often ad hoc and diluted.  CANARIE and Compute Canada serve as single points of contact for their infrastructure.  Typically, individual researchers are called to speak on behalf of data infrastructure, even though they may represent only a narrow perspective on data management infrastructure.  A consequence is that the voice for research data often becomes haphazard.  The risks are that a data advocate may not be present at an important research infrastructure event or that the message is too narrow for today’s range of research data issues.

Third, RDMI is dependent on bottom-up initiatives, requiring a great deal of coordination and cooperation to be successful.  The organization of top-down initiatives typically depend on control and governance.  With bottom-up projects, the most important organizational factors are trust, collaboration, and cooperation.  These two different organizational structures also tend to result in different styles of internal politics.

Finally, the international peers for each of Canada’s infrastructure pillars are different.  Both CANARIE and Compute Canada see their counterpart organizations in the United States, Australia, United Kingdom, and the rest of Europe as their peers.  The models and practices for funding and planning are also similar among these peers.  Look at what is happening to RDMI within this same group of countries: the National Science Foundation in the U.S. provides grants for data curation projects through its DataNet program; the European Union supported the Global Research Data Infrastructures 2020 project to help chart the course for developing a global data ecosystem; Australia established the Australian National Data Service to support researchers with their data curation needs; in the U.K. JISC offers its Managing Research Data program, which funds projects in RDMI.  These examples are all top-down driven and involve incentive programs for data infrastructure.  At this stage, the development of RDMI in Canada has very little in common with CANARIE and Compute Canada’s international peers.  A subsequent Blog entry will address who the international peers currently are for Canada’s RDMI.

The next entry discusses RDMI components of technology, services and expertise and how they are organized locally or globally.

[The views expressed in this Blog are my own and do not necessarily represent those of my institution.]

Research Data Management Infrastructure I

Beginning in 2010, the authors of the CARL application to the Canada Foundation for Innovation (see Community Actions to Preserve Research Data in Canada) used the term, Research Data Management Infrastructure (RDMI), to identify the confluence of e-Science, Cyberinfrastructure, and data-intensive science (see From National Institution to National Infrastructure).  We like to believe that we coined this concept, although in the U.K. a JISC funding envelope used the identical terminology at approximately the same time.  The JISC program description mentions several drivers that shaped the purpose of this specific funding envelope, many of which are just as relevant in Canada as in the U.K.

Higher education institutions are under increasing pressure to provide services and infrastructure for research data management. These pressures come from a variety of sources: the opportunities of more data intensive and more open, collaborative research; the requirements of research funders; the increasing concern for research transparency and integrity; institutions concern to avoid [reputational] damage caused by poor responses to FoI requests or by data loss.  JISC website

The definition I use for RDMI builds on the context described in the JISC description:

RDMI is the mix of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.

This discussion will focus specifically on the context served by RDMI, namely, data activities across the research lifecycle.

The Research Lifecycle

The research process is made up of a large set of activities that tend to be grouped into a series of fairly discrete stages.  Each stage typically consists of a set of related activities to accomplish a primary task, the outcomes of which are then passed to the next stage. For example, a survey’s design stage will result in the selection of a sample and an instrument for collecting data. The completion of these activities flow to a data collection stage where interviews are conducted and information is gathered from the sample.  While not all stages are necessarily linear, many of them do have logical dependencies that require sequential ordering.  For example, a research proposal is typically prepared before a grant application is submitted.

As with any project management operation, the granularity at which activities are described presents different views of a project. Similarly, the stages in the research lifecycle can be aggregated or disaggregated into larger or smaller groupings. Nevertheless, there is a level at which a primary task will be accomplished and its outcomes passed to another stage.  In a survey, for example, there is a point where data processing is completed and a data product is passed along for analysis.

Research workflow of a typical scholar showing the nonlinear development of research projects and the multiple stages at which data are collected

The Jahnke and Asher diagram of the workflow of a typical researcher is intended to show the nonlinear nature of the research process.  I feel that the more important message depicted in this workflow is the connection to data throughout the various stages.  Many of the activities in the research lifecycle indirectly or directly involve aspects of data management.  The above diagram shows examples of data-related tasks in the feasibility research, project design, and active research stages.

The Research Lifecycle

There is a second lifecycle that is closely interrelated with the research lifecycle.  This is the data lifecycle, which overlaps stages in the research lifecycle but also consists of some important stages independent of project-based research, including stages dealing with data dissemination, preservation, discovery, and repurposing.  While the Humphrey diagram does not identify stages specific to the data lifecycle, it does consist of more stages outside the project level shown in the Jahnke and Asher diagram, including references to knowledge transfer and repositories for data and research outputs.

Research Data Management

Research data management involves the practices and activities across the research lifecycle that involve the operational support of data through design, production, processing, documentation, analysis, preservation, discovery and reuse.  Collectively, these data-related activities span the stages of project-based research as well as the extended stages that tend to be institutionally based.  The activities are about the “what” and “how” of research data.

RDMI is the configuration of staff, services, and tools assembled to support data management across the research lifecycle and more specifically to provide comprehensive coverage of the stages making up the data lifecycle.

Data Stewardship

In contrast to Data Management, Data Stewardship is about the identity of those responsible for ensuring data management activities are performed to best practice levels and standards across the lifecycle.  Stewardship addresses “who” is responsible for a specific data activity (I’d like to acknowledge Wendy Watkins’ contribution in making this distinction between responsibility and activity).  Data policies, institutional norms, granting council requirements, and domain practices all contribute to defining the roles of those who are responsible for data at the various lifecycle stages.  Ideally, a comprehensive plan at the beginning of a research project would identify the supporting parties across the data lifecycle.  If a data management plan fails to identify who is responsible for specific data-related activities, the risk that not all activities will be completed is heightened.  Data Management Plans should be broadened to become Data Management and Stewardship plans.

The design of RDMI needs to enable data stewards across the data lifecycle to fulfill their responsibilities.

Project-level and Institutional-level Stewardship

A clarification needs to be made about the parties responsible for the various stages of the data lifecycle.  We are currently in a period during which data stewardship roles are under scrutiny.  Clearly, there are stages for which the researcher is the data steward.  The model for conducting research has traditionally been at the project level.  In this context, the researcher is responsible for both defining and conducting the work.  They are also often responsible for securing the funds to do the research.

However, as noted in the JISC quote above, increasingly institutions are discovering a need to take on new responsibilities dealing with research data management, which often entails providing services and infrastructure.  University administrators are much more aware of the value of data to their institution than they were necessarily in the past.  Both operational and research data are now being treated as digital assets that need policies, practices, services, and infrastructure to secure their future.  One consequence is the willingness of some institutions to support stages in the data lifecycle that previously had fallen between the cracks.  Some of these new responsibilities for data require additional investments in services and infrastructure, while others will involve the redeployment of staff or reconfiguration of services to fulfill newly accepted data responsibilities.

Research Data Management Interventions

The Jeffreys’ diagram shows stages in the institutional model from a JICS-funded project to develop research data management infrastructure at the University of Oxford.  While this graphic was not necessarily intended to depict the shared responsibilities between project-level research and the institution, one can see the interplay between both.  The left-hand stages of Project Planning, Project Setup, Data Creation, Documentation, and aspects of Local Storage largely are the researcher’s responsibilities, while the institution assumes responsibility for aspects of Local Storage, Institutional Storage, Rediscovery Mechanism, and Retrieval Mechanism.  Oxford has chosen to provide a mix of services across all of these stages even though the researcher is the primary data steward in half of the stages.  The infrastructure in these stages is to help researchers accomplish their data management tasks without yielding their control over them.

Working with researchers on their campus, university senior administrators have an important leadership role in developing ground-level RDMI.

Institutional Collaboration

The most innovative nations in the future will be those that best manage their research data today.  This is a meaningful incentive for institutions in Canada to collaborate in the development of RDMI, and all the more important in the absence of top-down national support.  No single institution can on its own manage the problem posed by research data.  But collectively, institutions working together can build the shared infrastructure needed by the research community.

Several successful models of collaboration across institutions attest to the viability of building national RDMI through a shared approach.  The Canadian Polar Data Network is one example of a cross-sector collaboration between the higher education and federal government sectors that provided data curation and preservation services for Canadian-funded research in the recent International Polar Year.  Collectively, this network of institutions is able to provide a greater service than any one could offer individually.

Institutional engagement in data stewardship becomes an important step in developing bottom-up national RDMI.

The next essay addresses the three pillars making up research infrastructure in Canada and compares RDMI with the support for high speed research networks and high performance computing.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]

From National Institution to National Infrastructure

The idea of a distributed network providing data archive functions was presented as one of three models in the 2002 report of the National Data Archive Consultation (NDAC). This was a radical departure from the concept of a national institution supporting research data. After all, preservation requires the longevity of a trusted, enduring institution. Individuals and technology come and go but an institution is needed to span the centuries. In comparison, the notion of a series of nodes connected to a network configuration seemed very ephemeral. We all know that technology is anything but static. How could a national data archive be based simply on one of today’s technology platforms?  This perception, however, was a misunderstanding about how a distributed network for digital preservation could be organized.

At the time of the NDAC final report, digital preservation as a field was starting to come into its own, having only seriously taken root in the latter part of the 1990s.  Much of the initial focus within digital preservation was on individual institutions developing practices and building infrastructure to preserve local digital collections of texts and images.  With the development of computing platforms to support institutional repositories and with the popularization of open access publishing, activity in digital preservation accelerated.  While these developments tended to focus on single institutional initiatives, the underlying infrastructure was capable of supporting a nationally distributed research data preservation network consisting of institutions collaboratively committed to the longevity of the service.

This became the backdrop to a fundamental shift in the way national research data preservation services in Canada might be established.  The introductory essay to this Blog indicated that several studies over the years proposed building a new national institution for this purpose.  This was the dominant model until approximately 2006.  Until then, implementing a national data archive was seen primarily to depend on a champion to stir up the necessary political will to build the new institution.  In addition, this vision was very much a top-down approach of accomplishing this mission.

At the time that NCASRD was underway in Canada, e-Science had established itself in Europe, while equivalent activities in the United States were called Cyberinfrastructure.  Both e-Science and Cyberinfrastructure have their origins in national funding programs supporting computationally intensive infrastructure for the management and processing of very large datasets (now commonly known as “big data”).  Of course, this included high-speed optical research networks and high-performance computing (HPC).  Around 2007, Jim Gray broadened the understanding of e-Science through his work on data-intensive science, which he characterized as data capture, data curation, and data analysis (see The Fourth Paradigm, which was dedicated in his memory).   Data-intensive research quickly unveiled the need for data interoperability across scientific domains. In fact, data interoperability has become an integral part of e-Science and Cyberinfrastructure.  The net result of e-Science, Cyberinfrastructure, and data-intensive science has been an investment in and development of new computational services built around research data.

The CARL application to the Canada Foundation for Innovation called these new computational services, Research Data Management Infrastructure (RDMI). It represents the confluence of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.  Understanding infrastructure for research data from this perspective changes the focus from a dependence on top-down initiatives to the potential for bottom-up organization.  The CARL contribution also consisted of persistent institutions dedicated to digital preservation.  Several research libraries committed to the long-term, collaborative operation of digital preservation infrastructure could replace the model of a single, national data archive.  Instead of a national institution, there is now a viable alternative of national infrastructure to support the management and preservation of data.

The next essay goes more deeply into research data management infrastructure.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]

Community Actions to Preserve Research Data in Canada

It takes a research community to preserve its data.

Without leadership from a national institution for research data management and preservation (see this Blog’s Introduction), communities that have interests in research data in Canada have become essential in moving forward an agenda to build this needed infrastructure.  At this stage, the strategy for data in Canada has become reliant on community-level actions.  The diversity of domains, sectors, and jurisdictions with stakes in research data complicates efforts to mobilize a grassroots, bottom-up plan for action.  There have, however, been some recent community activities around data that are encouraging.

Libraries and Archives Canada (LAC) tapped into the cultural, heritage, and academic sectors to achieve community engagement in identifying basic principles and goals for a national digital information strategy.  The Canadian Association of Research Libraries (CARL) undertook the drafting of an application to the Canada Foundation for Innovation (CFI) for research data management infrastructure.  As background support for its application, the steering committee for this initiative consulted widely across the scholarly research community, bringing together data interests from diverse research domains.  The Research Data Strategy Working Group (RDSWG), composed of representatives from organizations and agencies concerned about research data in Canada, sought ways to implement recommendations from the deadlocked National Consultation on Access to Scientific Research Data (NCASRD).  The work of this group contributed to the successful 2011 National Data Summit that attracted the participation of over 160 senior officials with interests in research data across sectors.

The Canadian Digital Information Strategy

LAC undertook a community-based consultation beginning in 2005 to develop a national digital information strategy.  Working with over 200 organizations from the public, private, and academic sectors, a National Summit was held in December 2006 bringing together representatives from these stakeholders to identify key components of a digital strategy.  Early in 2007, the Strategic Development Committee (SDC) was struck to synthesize the output of the Summit and to provide substantive input into a draft strategy.  Three sub-groups (Science and Research; Cultural Heritage; and Government Information) were formed to tackle this Committee’s workload.  The contributions of the Science and Research sub-group made the draft version of the digital strategy a valuable statement for and relevant to research data.  The resulting draft digital information strategy was released in the fall of 2007, which launched a public review that was conducted until early 2008.  In March 2010, stakeholders who contributed to the consultation were sent a copy of the final report entitled Canadian Digital Information Strategy: Final Report of Consultations with Stakeholder Communities 2005–2008, bringing closure to the process.

Federal inter-departmental politics intervened between 2008 and 2010, undermining the important community involvement that went into forming this strategy.  Industry Canada laid claims on the digital economy and perceived the national digital information strategy as treading on its turf.  The outcome of this internal political struggle surfaced in the May 10, 2010 announcement of National Consultations on a Digital Economy Strategy, made jointly by the then Minister of Industry (Tony Clement), the Minister of Canadian Heritage and Official Languages (James Moore), and the Minister of Human Resources and Skills Development (Diane Finley).  This consultation was conducted online for only one month, drawing upon only a fraction of the community involvement in the digital information strategy.

While the politics over strategies between digital information and a digital economy undermined the eventual adoption of the Canadian Digital Information Strategy, the engagement of the community in shaping the LAC-developed strategy was very successful and demonstrated common ground among the diverse group of stakeholders that have interests in managing, providing access to, and preserving digital content.

A Proposal for a National Collaborative Research Data Infrastructure

The CARL Directors launched an initiative in June 2010 to prepare a proposal for a national collaborative research data infrastructure.  While the Canada Foundation for Innovation had yet to announce the program envelope for its next CARL Research Data Management Infrastructurefunding round, there was anticipation of a program that might support applications for a national platform.  The vision was to finance the development of a national network of research data services at contributing CARL member institutions, including ingest centres to work with researchers in receiving their data, staging repositories to assist researchers with the management of their data over the life of a project, and data repositories responsible for the long-term preservation of digital research data.  From the beginning of this initiative, the CARL Directors met with many stakeholders, seeking their endorsement for the proposal.  They achieved support from organizations representing other components of Canada’s research infrastructure: Canada’s high-speed optical research network (CANARIE), Canada’s high performance computing grid (Compute Canada), and the Canadian University Council of Chief Information Officers (CUCCIO).  They also held a meeting with researchers from several domains to identify their requirements of a national research data infrastructure.  Out of these discussions with fellow stakeholders, CARL built a network of supporters within the research community.

When CFI announced it funding program, it did not include a national platform competition.  This complicated the logistics of the CARL proposal.  The CFI program that was being run would require each university to make the CARL proposal a high priority among the other proposals on their campus.  In the end, not enough support could be garnered from campuses to compete in this funding round.  The politics of funding envelopes rather than inter-departmental turf prematurely ended this effort at building national research data services.  Nevertheless, the CARL Directors were successful in communicating their ideas and in building community support for this vision.

The Research Data Strategy Working Group’s National Data Summit

The phoenix rising out of the ashes of the National Data Archive consultation and NCASRD was the Research Data Strategy Working Group.  This informal group, without having financial backing, seeks to find ways of advancing the recommendations of the two earlier consultations.  With roots in a number of organizations and agencies for which research data are important, members in the RDSWG strive to keep one another informed about projects and opportunities that will push the research data agenda forward.  When the government signaled its support for open data in 2010, the RDSWG capitalized on this new direction by proposing to host a National Data Summit that would bring together senior officials to discuss the challenges around research data in Canada.  Funding doors opened as the RDSWG promoted the idea of such a Summit and by the spring of 2011 a program was put in place for September 2011.

The outcome of the National Data Summit was the widespread recognition that research data activities need to be coordinated in Canada.  The discussions in the Summit revealed many common issues across research domains and sectors, demonstrating the value of a forum for sharing and debating data issues.  The Summit participants recommended holding a similar event within eighteen months and endorsed formalizing a secretariat to support such a forum.  In the fall of 2012, the RDSWG reorganized itself into Research Data Canada and continues to develop its role as a national forum for data stewardship issues.

Lessons for the Research Data Community

Both the experience of the Canadian Digital Information Strategy caught in the crosshairs of inter-departmental politics and of CARL’s withdrawn CFI application provide important lessons.  Neither the level of community engagement in defining strategic directions nor its endorsement of such a course were exempt from an inter-departmental power grab.  Some political battles are difficult to anticipate; others fall into a consistent pattern.  After all, the Federal Minister who buried the Canadian Digital Information Strategy also dealt the deathblow to the 2011 Census mandatory long form, which would have produced one of Canada’s highly valuable digital information assets.  One lesson from this experience is to avoid turf battles between federal departments, unless Treasury Board is on your side.

Similarly, one cannot assume that innovative ideas, even ones that could accelerate Canada to the forefront of research data infrastructure, will trump local interests.  A lesson from the CARL experience is that funding to develop nationally shared services will face stiff competition from local interests, even though the national services may benefit those locally.  This is one situation where strong community intervention may be able to persuade local interests that national gains outweigh any perceived local loss.

The response to the National Data Summit and Research Data Canada shows that Canada’s research community is willing and eager to engage in activities that may shape strategies and plans around data management and preservation.  This undercurrent of support needs to be nurtured and channeled to achieve a national collaborative research data infrastructure.

The next essay looks at the strategic shift from a national institution to national infrastructure for research data in Canada.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]