From National Institution to National Infrastructure

The idea of a distributed network providing data archive functions was presented as one of three models in the 2002 report of the National Data Archive Consultation (NDAC). This was a radical departure from the concept of a national institution supporting research data. After all, preservation requires the longevity of a trusted, enduring institution. Individuals and technology come and go but an institution is needed to span the centuries. In comparison, the notion of a series of nodes connected to a network configuration seemed very ephemeral. We all know that technology is anything but static. How could a national data archive be based simply on one of today’s technology platforms?  This perception, however, was a misunderstanding about how a distributed network for digital preservation could be organized.

At the time of the NDAC final report, digital preservation as a field was starting to come into its own, having only seriously taken root in the latter part of the 1990s.  Much of the initial focus within digital preservation was on individual institutions developing practices and building infrastructure to preserve local digital collections of texts and images.  With the development of computing platforms to support institutional repositories and with the popularization of open access publishing, activity in digital preservation accelerated.  While these developments tended to focus on single institutional initiatives, the underlying infrastructure was capable of supporting a nationally distributed research data preservation network consisting of institutions collaboratively committed to the longevity of the service.

This became the backdrop to a fundamental shift in the way national research data preservation services in Canada might be established.  The introductory essay to this Blog indicated that several studies over the years proposed building a new national institution for this purpose.  This was the dominant model until approximately 2006.  Until then, implementing a national data archive was seen primarily to depend on a champion to stir up the necessary political will to build the new institution.  In addition, this vision was very much a top-down approach of accomplishing this mission.

At the time that NCASRD was underway in Canada, e-Science had established itself in Europe, while equivalent activities in the United States were called Cyberinfrastructure.  Both e-Science and Cyberinfrastructure have their origins in national funding programs supporting computationally intensive infrastructure for the management and processing of very large datasets (now commonly known as “big data”).  Of course, this included high-speed optical research networks and high-performance computing (HPC).  Around 2007, Jim Gray broadened the understanding of e-Science through his work on data-intensive science, which he characterized as data capture, data curation, and data analysis (see The Fourth Paradigm, which was dedicated in his memory).   Data-intensive research quickly unveiled the need for data interoperability across scientific domains. In fact, data interoperability has become an integral part of e-Science and Cyberinfrastructure.  The net result of e-Science, Cyberinfrastructure, and data-intensive science has been an investment in and development of new computational services built around research data.

The CARL application to the Canada Foundation for Innovation called these new computational services, Research Data Management Infrastructure (RDMI). It represents the confluence of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.  Understanding infrastructure for research data from this perspective changes the focus from a dependence on top-down initiatives to the potential for bottom-up organization.  The CARL contribution also consisted of persistent institutions dedicated to digital preservation.  Several research libraries committed to the long-term, collaborative operation of digital preservation infrastructure could replace the model of a single, national data archive.  Instead of a national institution, there is now a viable alternative of national infrastructure to support the management and preservation of data.

The next essay goes more deeply into research data management infrastructure.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]