OA/Open Data Designs and Digital Repository Strategies
by Tom Adamich
This article explores scholarly communication issues for 21st-century academic and research libraries. I will identify several issues that academic libraries face when developing their scholarly communication programs to include strategies based on linked open data and semantic web architecture.
|The academic library community ... will need to keep listening, talking, and innovating in order to create the interconnected scholarly communication landscape of the future.
By Way of Background
In the 21st century’s second decade, scholarly research curation and repository program development have grown exponentially. Academic libraries, especially those supporting their institution’s research agendas, have recognized the importance of managing scholarly communication resources associated with the multiple steps in the scholarly research process: inquiry; data acquisition; data analysis; report development, publication, and dissemination; and postpublication activities.
Equally important is the recognition that scholarly communication resources must be accessible and discoverable in a 21st-century OA/open data (OA/OD) environment. In this context, there is an understanding that descriptive metadata indicating the resource, researcher, and research process information must be optimized using OA/OD designs and strategies (often referred to as linked open data and semantic web architecture).
Development of this framework has required—and will continue to demand—frequent dialogue among academic library professionals and researchers. This is prior to, during, and following the research process in order to successfully evaluate and execute effective strategies and designs.
Sai Deng’s Primer
One of the best current resources to address these concepts is a presentation by Sai Deng (University of Central Florida Libraries) at the 2018 ALA Midwinter Meeting. Deng enumerates a plethora of issues at play:
- Digital repositories, the web, and knowledge organization systems (KOS)—also known as controlled thesauri/subject headings and name identifiers
- Environmental scanning of authority control in digital repositories
- Name disambiguation and identification
- Engaging user interfaces (UIs) using author information
- Subjects and keywords; debate on the use of controlled vocabularies
- Web-based metadata and text analysis
- SEO and the evolution of digital repositories—OA and proprietary
- Authority control; identity management and discovery
In addition to highlighting the best practices in scholarly communication designs and strategies that are metadata-based and OA/OD-friendly, Deng provides copious notes and references to research reported in more than 40 resources. Citations include scholarly articles, presentations, and lists from metadata providers (such as OCLC), OA-based UIs (including Omeka and Islandora), and proprietary-based UIs (such as CONTENTdm).
Deng poses and answers many questions relating to how academic library professionals can better understand 21st-century OA/OD-based web architecture. This includes the following:
- Semantic web designs—RDFa /triples/JSON designs
- Controlled vocabulary and name identifier concepts (best practices, semantic-enabled vocabulary sets)
- UI options (legacy and emerging)
The concepts are presented in an easy-to-follow narrative; there’s ample opportunity for discussion, reflection, and further study. Particular emphasis is placed on the steps of the research process and the role of the researcher, especially the effective identification of the researcher and her or his body of research.
The ORCID Approach
One of the best systems for identifying and tracking a researcher’s scholarly output is ORCID. ORCID is an alphanumeric code that uniquely identifies scientific and academic authors—including their names and affiliations—and thus can be used to track their body of work. ORCID codes are unique, persistent, and maintained in an OA/OD web environment by a nonprofit organization dedicated to open information-sharing among research scholars worldwide (ORCID 2018).
ORCID codes allow researchers to document research process steps and outputs for specific research projects, including grant details and manuscript status, from draft stage to final publication. There’s also the ability to provide links to a researcher’s body of work and profile information (affiliation provenance, connections to joint research projects, etc.).
DataCite: Managing Research Data and Projects
Another interesting community-driven OA/OD ID project is DataCite. Similar to ORCID, DataCite is a nonprofit organization that provides and manages persistent identifiers. In this case, it’s DOIs, which were originally developed by CrossRef for journal article identification, but applied by DataCite for research data. According to DataCite’s mission statement, “We support the creation and allocation of DOIs and accompanying metadata. We provide services that support the enhanced search and discovery of research content. And we promote data citation and advocacy through our community-building efforts and responsive communication and outreach materials.”
In the DataCite 2018 Wrap-Up and 2019 Preview blog post, Robin Dasler (product manager at DataCite) reports that there are several key technology enhancements and DataCite community improvements slated to be introduced and shared this year. One of the most interesting is the launch of the DOI Fabrica API (a combination of the previous REST API and Fabrica API architecture). The advantage is that the Fabrica API aggregates more DataCite DOI management functions under one umbrella, eliminating the need to convert metadata structures to XML, as the Fabrica API is fully JSON-compliant (DataCite 2018).
Another DataCite technology enhancement (that will align with the Fabrica API upgrade) is the migration from Solr to Elasticsearch for DOI search, which is said will result in decreased time between DataCite DOI creation and DOI indexing. Additionally, in 2019, the Elasticsearch platform will have access to the Fabrica API architecture, opening access to more DataCite DOI search filters being exposed in order to retrieve better, more accurate search results.
Like ORCID, the DataCite user community is actively engaged. In 2019, there will be efforts to expand the ability of DataCite contributors to create and manage persistent identifier (PID) graphs, which are designed to connect research entities. DataCite user community members are being asked to contribute to the further development of PID graphs and recommend changes and enhancements.
As Wrigley mentioned in her discussion of ORCID community opportunities—and we’ve seen it in the robust activities of the DataCite community—the ability to share ideas and strategies that would promote consensus in the academic library scholarly communication and related research communities is key to the continued growth and success of OA/OD initiatives, particularly those that are metadata related. Additionally, from a technology perspective, the need to understand the present and future capabilities of OA/OD digital repository solutions will be important as libraries determine what the systems can accomplish—short term and long term—and what data architecture specifications are needed to enable both terminology and identifiers to allow the systems to function properly.
There is also a question of the dialogue and workflows that are required between academic library professionals and researchers as they continue to explore and question what options would best allow OA/OD research environments to grow and flourish. Will web practices dictate how to manage OA/OD research environments in the future? Is semi-automatic data manipulation (as accomplished by tools such as MarcEdit and Notepad ++) still needed to ensure data integrity and encourage periodic review of data integrity and accuracy? Where will academic research-based OA/OD designs and strategies be in 5 years? Or 10?
The academic library community has made great strides in promoting and enabling OA/OD repositories to develop. The same community will need to keep listening, talking, and innovating in order to create the interconnected scholarly communication landscape of the future.