The Next Frontier: Federal Librarians and Data
by Richard Huffine
Data is a mainstream topic for librarians today. The term “Big Data” is popular at conferences, and academic libraries are increasingly taking a role in managing data produced by researchers. The challenges and opportunities of managing data are impacting librarians in the U.S. federal government. In all kinds of government agencies, li brarians are taking on roles that support both researchers within their agencies and public users of the data that the agencies create.
The U.S. federal government has been preparing for these new roles. In 2011, the Library of Congress’ Federal Library and Information Center Committee (FLICC) released an updated document on competencies for federal librarians that addresses data. “FLICC Competencies for Federal Librarians” defines the knowledge, skills, and abilities federal librarians need to possess. The document is used to develop job descriptions, train employees, and align the librarian’s role with an agency’s strategic goals. It encourages the development of skills to “interpret, explain, and apply standards for data collection, management, curation, and accessibility.” It also encourages applying those skills in evaluating data management plans.
In February 2013, the Obama administration issued guidance for agencies with large investments in research to develop public access plans for their publications and data. In the fiscal year 2014 federal budget, Congress put similar requirements in place for the U.S. Department of Health and Human Services and the U.S. Department of Education. In March 2014, the administration reported progress within 23 agencies on drafting those plans. Throughout this process, librarians in several of those agencies have been involved to ensure the plans will meet the needs of their users.
Delivering public access to federally funded research is just one way that federal librarians are getting involved with data today. Here are a few examples of how librarians across the federal government are engaged in data management within their agencies and departments.
Board of Governors of the Federal Reserve
The research library within the Board of Governors of the Federal Reserve has taken the FLICC document to heart. It has developed new positions within the library to support data acquisition and licensing. The Federal Reserve System’s economists are supported by research assistants who apply their quantitative skills to real-world policy issues and to research projects.
The Federal Reserve System acquires hundreds of datasets annually, covering topics such as microlevel banking and aggregate macroeconomic time series statistics. While some of the data is accessed through commercial providers, most
Financial and market data is expensive, and how much of it can be presented within a research paper or report is routinely regulated. Data acquisition librarians not only work to negotiate rights for what their researchers need, but they review draft reports to ensure compliance with the licensing terms of the data sources used. Licensing of specific datasets can range from acquiring a narrow slice of data for a single researcher to securing rights and access for everyone within the Board of Governors and, at times, all 12 of the independent Federal Reserve Banks across the U.S. In order to specialize in the unique requirements of the users, each data acquisition librarian is responsible for several types of research data. Examples of their specialties include banking, consumer credit, real estate, and specific industry sectors.
The cornerstone of this program is the Library Information and Data Access (LIDA) catalog of data sources and their license terms and availability. The catalog is available internally to all researchers and provides information on datasets, databases, ebooks, and industry news sources. It offers categorized descriptions to help everyone from experts to novices navigate the complicated maze of sources available for understanding financial markets and their regulation.
As a whole, the Federal Reserve System conducts monthly meetings with data librarians across its network to discuss issues, op portunities, and strategies for supporting data within the organization. The data acquisition librarians also participate in a larger group of financial data professionals within the U.S. federal government under the Financial Stability Oversight Council (FSOC).
Making Government Data Accessible: The USGS Story
Another way in which federal librarians are getting involved with data is by taking part in communities of practice within their agencies and departments. At the USGS (U.S. Geological Survey), librarians have been participating in the Community for Data Integration (CDI; usgs.gov/cdi) since 2011. The community is an open forum comprising scientists, data managers, librarians, and program managers. They share the goal of developing tools and best practices that make it easier to integrate earth science data.
Access to federal government research data used in publications is seen as a critical element of reproducibility, usability, and scientific integrity. Even prior to the series of open data initiatives introduced by the Obama administration in 2013, the USGS was working toward making its data broadly accessible for other researchers to find, access, and use.
The challenge of integrating earth science data is an ongoing issue for researchers. Data across multiple disciplines (geology, geography, biology, and hydrology, etc.) is collected, managed, and documented in very different ways. These challenges were not created to limit integration. They emerged from decades of solitary and divergent practices for gathering data and using it in the individual disciplines.
One strategy for addressing the challenges at the USGS was to establish the Data Management Working Group (DMWG). This organization works with the USGS Core Science Analytics, Synthesis, and Libraries, a program focused on data management and high-performance computing. That partnership has been a cornerstone of pushing cultural change for how USGS data is managed throughout its life cycle. DMWG launched a number of concurrent activities to support increased access to data-management tools and best practices. One of its products is a data life cycle for the USGS, described in the document, “The United States Geological Survey Science Data Lifecycle Model.” Another product is the USGS Data Management website (usgs.gov/datamanagement), which is designed around the data life cycle to provide best practices and tools for data management.
CDI is also working with the USGS Office of Science Quality and Integrity to develop foundational policies that support managing data for access and reuse. The policies address data management, metadata for describing data, software releases, and review and approval for data release. They are scheduled to be introduced early this year.
But managing data within the federal government is more than an exercise in documentation. Other efforts critical to increased data access and reusability revolve around tools and infrastructure. The USGS is deploying several innovative applications that help scientists better manage their data. ScienceBase (sciencebase.gov/catalog) is a data-management platform that allows the uploading and cataloging of data as well as data sharing across communities. USGS is developing ScienceBase as a mechanism for the release of its official data and is working toward introducing a data-management dashboard that will incorporate a variety of tools for managing data.
Some of those tools include the Online Metadata Editor (OME; mercury-ops2.ornl.gov/OME), which was developed in partnership with the Oak Ridge National Laboratory Mercury Consortium. The OME allows researchers to create metadata records following the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM). The interface is designed to reduce the burden of knowing about the standard by asking researchers common questions about their data. Another tool, the USGS Science Data Catalog (data.usgs.gov/datacatalog), launched in 2014 and provides a single point of access to USGS data. USGS researchers can contribute to the catalog through a dashboard that registers a harvest point for their metadata and provides reports on the status of records in the system. The Science Data Catalog is the USGS response to open data requirements for federal agencies and serves as a conduit to the U.S. Department of the Interior and Data.gov catalogs.
Providing Ways to Cite Data in Publications
The final example of federal librarians taking on new roles with data comes from the U.S. Department of Energy (DOE). The DOE’s OSTI (Office of Scientific and Technical Information) employs librarians and other information scientists, making DOE research accessible to broader research communities. In 2010, it piloted a program to provide a registration service that assigns digital object identifiers (DOIs) for data produced by DOE employees, contractors, and grantees.
OSTI provides data registration as a free service to DOE organizations and projects, but also offers it to other U.S. federal agencies through interagency agreements. Member agencies using the service submit metadata either manually or by using an automated web service. Based on that submission, a DOI is created and associated with the record. The DOI and a subset of the metadata are sent automatically to DataCite, where the DOI is entered into its registry and becomes live so users around the world can cite the source data they are using in their papers. The DataCite DOI can also be used as a permanent link to the data, wherever it resides, through link-resolution services.
The adoption of systems and procedures for making research data accessible has taken time but has seen steady growth since the DOE started its pilot in 2010. The team at OSTI has seen a wider variety of projects involving data registration. In addition to DOE programs, other federal agencies are beginning to adopt the data-resolution service. One of its first non-DOE adoptions was by the U.S. Department of the Treasury to register identifiers for financial reports and data.
The National Institute of Mental Health (NIMH) is planning to register clinical data from its various databases, and the U.S. Department of Agriculture’s Agricultural Research Service will register datasets deposited to an online repository from its many branches and projects. The USGS, in partnership with the Oak Ridge National Laboratory
Mercury Consortium, developed a DOI tool (mercury-ops2.ornl.gov/DOI) that allows its re searchers to create identifiers for their research data collections. The tool lets data and information providers generate globally unique, persistent, and resolvable identifiers for any kind of digital object.
The Next Frontier
As the next frontier for librarians, data offers a whole host of challenges and opportunities. Well-managed data not only provides trustworthy information for analysis, but it also delivers documentation that offers context about how it was collected and what it contains. Librarians understand the value of context as well as the need to document the meanings of both individual fields and entire datasets.
As these examples illustrate, federal librarians are taking on a variety of roles in acquiring and managing data and making it accessible. While they may not be the sole providers of products and services associated with data, they are partnering with researchers, participating in teams, and demonstrating their value. The reality is that data is now an integral part of supporting the work of government.