Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com)

Magazines > Computers in Libraries > May 2019

Back Index Forward

SUBSCRIBE NOW!
Vol. 39 No. 4 — May 2019
FEATURE

Opening Access to Academic Research via an Institutional Repository
by Todd Digby and Robert Phillips


This pilot project consisted of two phases that included implementing various components of the Elsevier API infrastructure.
The institutional repository (IR) has traditionally been defined as a place for academic libraries to store and showcase faculty and student research and other institutional information. IRs have existed since the late 1990s and have traditionally not veered from this goal. The University of Florida (UF) has had an IR in operation since approximately 2006, charged with the task of acquiring, managing, and preserving digital resources. This way, they will remain accessible to university’s constituents over the long term. Certain limitations have been placed on access (due to legal, donor, and/or other reasons), but generally, UF’s libraries endeavor to make its digital resources accessible to all users. 

Within the past decade, the demand that academic research be published under an OA model has increased substantially. This growing demand is due, in part, to the fact that many federal and national research funding agencies have conditioned the award of grant money on a requirement that recipients publish their findings under the OA model. Both noncommercial and commercial publishers have grown in their capability to support OA for the articles they publish and distribute via their respective digital platforms. Although researchers are starting to publish in journals that support OA, this process is happening outside of the IR environment. Therefore, published papers are not being collected as part of the library-managed system. 

This article will focus on a project by UF to work with a major commercial publisher to integrate, through the use of APIs, both OA-published articles and the final author manuscripts (written by UF authors) into our IR system.

Roots of the Initiative

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

This project was instigated by our library dean, after discussions that spanned several years—with the university’s provost, VP for research, and faculty members, including the Faculty Research Council—concerning the need to document and provide improved access to faculty research. 

The talks extended so long because it was a daunting task, given that UF faculty members and researchers publish approximately 8,000 scholarly journal articles a year and have never been encouraged or required to deposit their article manuscripts in an IR. However, the imperative to document the university’s research output had intensified, due to the mandates and compliance requirements for OA deposit of the results of federally funded research. 

It was at this tipping point that our dean started to have discussions with Elsevier. This was to determine if there was a way that we could identify and integrate UF-authored articles published in Elsevier journals and develop a method to both search and access them within our existing IR. Since starting this project, we were asked numerous times, “Why work with Elsevier?” But the answer was clear. Our analysis showed that Elsevier published the highest volume of papers of any of the publishers, followed by Springer Nature and then Wiley.

We were in an excellent position to carry out this content integration mandate, because we already had the proper infrastructure in place. Unlike some other institutions, where a variety of platforms are employed to manage various digital content types housed in separate systems, UF’s IR is housed on the same system that hosts all our other digital library collections. The system we use is SobekCM, which was initially developed at UF and is open source. Given the customized nature of this platform, we have employed staff developers who are dedicated to working on the system. They have the skills to focus on implementing the APIs that were provided by Elsevier without the need to hire an external vendor or developers. We were ready to go.

Piloting the Project

This pilot project consisted of two phases that included implementing various components of the Elsevier API infrastructure. For this effort, we made use of three Elsevier APIs (the Content Identification API, the User Verification API, and the Article Retrieval API). Figure 1 shows which API was employed at which point in the workflow. 

The pilot’s first phase had the primary goal of increasing the comprehensiveness of coverage within UF’s IR (IR@UF) to content published by UF authors in Elsevier journals. By doing so, we aimed to assure that IR@UF users would gain access to the best available versions (i.e., the published versions) of faculty research papers.

During this phase, we were able to locate more than 30,000 articles by UF authors, from 1949 forward, and provide links within IR@UF back to Elsevier’s ScienceDirect. These links included both OA and subscription-required articles contained within ScienceDirect. We automated the process by using the Elsevier APIs to locate the UF author articles and retrieve the metadata for and the full text of these articles in the IR@UF index. As a result, these published versions became searchable along with the other content found within the IR. 

Please note: Although via this process we have obtained both the metadata and full-text indexes from Elsevier, we do not house permanent copies of the published articles. By taking this approach, we expanded the concept of the IR from being solely a repository—where all the content is housed in one system—to the broader concept of a repository and referatory (or redirect) system in which some of the content is housed on external systems.

As a user searches IR@UF and the search results are displayed, the User Verification API determines if he or she is located at an institution with a ScienceDirect subscription. Then the system displays the appropriate access method and version of the article. If an OA version is available, the system will present this to the user, even if he or she is not located at an institution that has a subscription to ScienceDirect. Figure 2 shows how the search results screen is presented to a user searching from UF or another institution that has a subscription to ScienceDirect for that article.

Since our objective was not just to identify articles that were by UF authors, but to highlight those articles that were OA and available to all users, our search results pages clearly label OA materials (as can be seen in Figure 3).   This phase of the project was completed and has been in operation since mid-2017. It provided a test for the publisher APIs and of how well we could make the integrations into our IR system. 

Beyond Published Versions

With the first phase operational, the second phase focused on expanding access to UF faculty research for a broader audience—those who might not have a ScienceDirect subscription and still can’t access an article because it was not published as OA.

At this point, we worked on providing access to post-embargo accepted manuscripts. They are the latest prepublication version of an article that exists in a manuscript form and is not in the final publication format. In this search scenario, a user that has a ScienceDirect subscription (accessing our IR from a subscribing IP address) will be presented with a link to the final publication article. If a user is accessing this same article from a non-subscribing IP address, he or she will be presented with the accepted manuscript version of the article. Initially, around 3,000 accepted manuscripts were identified for this type of access. 

Using the same APIs as in the first phase, we were able to identify the content for both the accepted manuscript and final published version to then determine which version to present to which user. Figure 4 shows how the accepted manuscript search results will display for a user who may only access a final manuscript via IR@UF.

A key objective was to present accepted manuscript versions from within our IR so that the user would not have to navigate to an external system to retrieve an article. Responding to this need, we configured our system to present these accepted manuscript versions using an embedded streaming viewer developed by Elsevier. Our own development staff was tasked with producing the wraparound structure for embedding the Elsevier viewer using an iframe. Figure 5 shows the embedded streaming article viewer.

As can be seen in the streaming viewer, users have article access and can read the article using the embedded viewer, and they have the option to download a PDF of it. Also presented in this viewer page is the option for the user to navigate to the publisher’s webpage for the article.

At the time of this writing, we have successfully implemented this process and have started to operationalize the release of more accepted manuscript versions into our system as they come off their respective embargo periods. We are also currently testing out a new version of the accepted manuscript streaming viewer that Elsevier has developed. It includes better functionality and some additional user-friendly features.

Where We Stand

Throughout this process, we have conducted usability testing to better guide our decisions, including for button placement and system navigation. Usability testing has not only identified a number of issues with the project described here, but it has pointed to some deficiencies within our overall system that may need to be addressed. 

Along the way, the project ran into a number of challenges, including the difficulty of identifying UF authors using existing metadata. This reinforced our belief that the system would benefit from using reliable researcher identifiers, such as Ringgold and ORCID, to more accurately link UF authors with their respective works. Working with a large publisher presented challenges of its own, as we had difficulty arriving at a common understanding of the distinctive approaches and unique roles for content provision by publishers and academic libraries. Finally, there was the challenge of adapting the unique IR@UF platform (SobekCM) to work with Elsevier APIs and the streaming manuscript viewer.

An important consideration when integrating with external databases or resources using APIs—or, for that matter, any other integration method—is the need to understand that these integrations will need constant care. Any changes that may be made to either your local system or the remote system could impact your integration and result in the loss of functionality. These integrations are not a one-time process; they will demand continued maintenance that will last as long as an institution wants the integration to last. Additionally, if the external resource provider changes its focus or stops supporting the integration, you may have no choice but to remove the integration and lose the additional functionality.

Overall, we feel that this project has been a success and has presented us with a broader vision of the possibilities for IR@UF. It is our hope that it will help the library maximize the research impact of articles by UF authors by providing additional visibility and broader access. 


Todd Digby (L) digby@ufl.edu) is the chair of library technology services at the University of Florida’s George A. Smathers Libraries. He leads a service-oriented department that researches, develops, optimizes, and supports advanced library information systems and technology for the university. Prior to joining the University of Florida in 2016, he held administrative and faculty positions at the Minnesota State Colleges and Universities system, the University of Wisconsin–River Falls, and the University of South Dakota.

Robert Phillips
(R)  (podengo@ufl.edu) is the technical lead on using APIs with the University of Florida to populate the institutional repository, helping to ensure compliance and provide access to materials. He has a B.A. in experimental psychology from New College of Florida, a Ph.D. in cognitive psychology science from the University of Maryland, and a J.D. from the University of Florida. A self-taught programmer, he holds numerous patents for a variety of computer innovations.