Research Data Management Infrastructure III

In earlier entries to this Blog, Research Data Management Infrastructure (RDMI) was defined as the mix of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.  The context for RDMI has already been discussed in terms of the research lifecycle and of the two additional components making up research infrastructure: Canada’s high speed research network and high performance computing services.  This essay will address the elements of data infrastructure and how they are organized.

In developing its Cyberinfrastructure program, the U.S. National Science Foundation funded a project to investigate how best to build successful infrastructure.  CyberinfrastructureComing out of this study was the report, Understanding Infrastructure. The authors establish early in their work the significant connection between social organization and the use of communication technology.  Regarding cyberinfrastructure, they stress that it “is about more than just pipes and machines” (p. 5) and emphasize the importance of social organizational factors in shaping solutions.  They note that in developing cyberinfrastructure, solutions can be social, technical, or a combination.  They feel that the distribution of solutions is central to building infrastructure.  Using the diagram by Millerand, solutions are portrayed as being distributed across two dimensions: technical-social and local-global.

[C]yberinfrastructure is the set of organizational practices, technical infrastructure and social norms that collectively provide for the smooth operation of scientific work at a distance. All three are objects of design and engineering; a cyberinfrastructure will fail if any one is ignored. Understanding Infrastructure (p. 6)

A Textbook Example

Earlier this year I experienced a textbook example of this conceptual model of infrastructure while visiting Bryn Mawr University just as they were changing the way they provide campus wireless services to guests.  When I arrived on campus, I was given a sheet of paper containing the name of the campus wireless service, an account ID and password to log into this service, and a set of instructions for different devices and operating systems.  I was required to obtain a separate account for each device on which I wished to use campus wireless services.

This approach to providing guests with wireless access to the campus network and the Internet falls under the social-local set of solutions in the above infrastructure model.  The procedures were organized around human intervention, i.e., having to find and speak with a person who could provide me with the information sheet, and around social norms requiring me to sign an agreement statement, confirming my acceptance of the rules for using their wireless.  The wireless technology, however, was typical industry-standard WIFI.

On the second day of my visit, a new wireless service was launched for guests on their campus: Eduroam.  This is the international service that allows academic guests from university members of Eduroam to gain access to secure wireless networking while visiting another Eduroam site.  Because my home institution is an Eduroam member and can authenticate my credentials through this service, I simply open my wireless device, go to the list of available wireless services where I am, and if Eduroam is among them, I select it.  The system behind the scene allows the local Eduroam host to verify my credentials with my home institution and to provide me with selective network services on their campus.  For example, if the Library has a license for a database that does not allow guests access, the local implementation of Eduroam can hide this database from my guest access.

This service approach falls under the technology-global set of solutions.  My credentials are validated through my home institution using technology, allowing me to connect to wireless services at a member Eduroam campus, without having to go through another person or having to obtain temporary authentication credentials.  Eduroam has provided me with guest access easily to wireless services in the United States, Germany, and Canada.  There are higher education institutions in over fifty-five nations now supporting Eduroam.  It truly is a global solution to providing guest access to secure wireless networking.

Cyberinfrastructure and RDMI

How does this particular Cyberinfrastructure (CI) model relate to Research Data Management Infrastructure?  First, the CI model provides a conceptual framework for the definition of RDMI.  The RDMI elements of technology, services, and expertise are part of CI, although not expressed in exactly the same terms.   Applied to RDMI, organizational practices and social norms are aspects of the services supporting data management across the research lifecycle.  Services embody organizational responses to data management.  For example, offering researchers assistance with data management plans requires organizing resources to deliver such a service.  Social norms and expectations are also expressed in services.  A funding agency may require data management plans to get researchers to describe how they will share the data from their project, setting an expectation to share data.  Thinking of services in the context of RDMI is a combination of CI characteristics around social norms and organization.

Expertise is another component of CI and RDMI.  Data management activities span the research lifecycle and involve many different skills, drawing upon a variety of expertise.  The demands for data management expertise depend on the scale of the research project.  A small project may involve only a couple of people, who can manage with a general set of skills.  A much larger project may require a team of experts with each team member responsible for a specific specialization.  Expertise also is aligned with responsibilities for data management activities, which was identified as aspects of data stewardship in a previous Blog discussion.

Place is significant in CI and RDMI.  Research is increasingly conducted in collaborative, inter-institutional teams that span nations.  High speed optical research networks are vital for researchers who work at a distance from one another.  Whether working together in real time or asynchronously in different places, the network allows them to organize their workflow so each can contribute.  Similarly, researchers may require access to high performance computing (HPC) but are not located at an HPC site.  Over a research network they may gain access to the computing resources they require.  Distance also comes into play with RDMI.  Data may be gathered in one location, processed at another site, analyzed at yet another place, and preserved in an institution separate from these other locations.  Through a collaborative initiative, such as the Canadian Polar Data Network, an institution may offer preservation services for research data that behind the scenes consists of a distributed dark archive shared among several institutions.  The scope of some research data infrastructure requires global solutions.  One example is the need for infrastructure that will overcome barriers in the free exchange of scientific data across national borders.

The implementations of RDMI will vary from institution to institution but the set of solutions will be distributed locally or globally across technology, services, and expertise.

The next Blog entry will focus on the question:  Who are Canada’s international peers in Research Data Management Infrastructure?

[The views expressed in this Blog are my own and do not necessarily represent those of my institution.]

From National Institution to National Infrastructure

The idea of a distributed network providing data archive functions was presented as one of three models in the 2002 report of the National Data Archive Consultation (NDAC). This was a radical departure from the concept of a national institution supporting research data. After all, preservation requires the longevity of a trusted, enduring institution. Individuals and technology come and go but an institution is needed to span the centuries. In comparison, the notion of a series of nodes connected to a network configuration seemed very ephemeral. We all know that technology is anything but static. How could a national data archive be based simply on one of today’s technology platforms?  This perception, however, was a misunderstanding about how a distributed network for digital preservation could be organized.

At the time of the NDAC final report, digital preservation as a field was starting to come into its own, having only seriously taken root in the latter part of the 1990s.  Much of the initial focus within digital preservation was on individual institutions developing practices and building infrastructure to preserve local digital collections of texts and images.  With the development of computing platforms to support institutional repositories and with the popularization of open access publishing, activity in digital preservation accelerated.  While these developments tended to focus on single institutional initiatives, the underlying infrastructure was capable of supporting a nationally distributed research data preservation network consisting of institutions collaboratively committed to the longevity of the service.

This became the backdrop to a fundamental shift in the way national research data preservation services in Canada might be established.  The introductory essay to this Blog indicated that several studies over the years proposed building a new national institution for this purpose.  This was the dominant model until approximately 2006.  Until then, implementing a national data archive was seen primarily to depend on a champion to stir up the necessary political will to build the new institution.  In addition, this vision was very much a top-down approach of accomplishing this mission.

At the time that NCASRD was underway in Canada, e-Science had established itself in Europe, while equivalent activities in the United States were called Cyberinfrastructure.  Both e-Science and Cyberinfrastructure have their origins in national funding programs supporting computationally intensive infrastructure for the management and processing of very large datasets (now commonly known as “big data”).  Of course, this included high-speed optical research networks and high-performance computing (HPC).  Around 2007, Jim Gray broadened the understanding of e-Science through his work on data-intensive science, which he characterized as data capture, data curation, and data analysis (see The Fourth Paradigm, which was dedicated in his memory).   Data-intensive research quickly unveiled the need for data interoperability across scientific domains. In fact, data interoperability has become an integral part of e-Science and Cyberinfrastructure.  The net result of e-Science, Cyberinfrastructure, and data-intensive science has been an investment in and development of new computational services built around research data.

The CARL application to the Canada Foundation for Innovation called these new computational services, Research Data Management Infrastructure (RDMI). It represents the confluence of technology, services, and expertise organized locally or globally to support research data activities across the research lifecycle.  Understanding infrastructure for research data from this perspective changes the focus from a dependence on top-down initiatives to the potential for bottom-up organization.  The CARL contribution also consisted of persistent institutions dedicated to digital preservation.  Several research libraries committed to the long-term, collaborative operation of digital preservation infrastructure could replace the model of a single, national data archive.  Instead of a national institution, there is now a viable alternative of national infrastructure to support the management and preservation of data.

The next essay goes more deeply into research data management infrastructure.

[The opinions expressed in this Blog are my own and do not necessarily represent those of my institution.]