Thinking About Reference Linking

Vol. 10 No. 4 — April 2002

• FEATURE • Thinking About Reference Linking by Jill E. Grogg • Instruction Services Librarian, Mississippi State University

Table of Contents

Previous Issues

Subscribe Now!

ITI Home

The very nature of scholarly research has fundamentally changed with the increased availability of reference or citation linking. With the ubiquity of the Web, it seems natural to users and information professionals alike that they be able to link painlessly among full-text articles, abstracting and indexing (A&I) bibliographic information, and article reference lists. In a perfect world, such linking would occur seamlessly. However, in reality, technical obstacles, as well as organizational concerns, blur the vision of perfect, seamless reference linking. Information professionals need to be aware of the technical and organizational interests and obstacles associated with reference linking in order to better serve their users.

Caplan defines reference or citation linking as "the ability to go directly from a citation to the work cited, or to additional information about the cited work,"¹ whether the source and accompanying destination are journal articles, Web sites, conference proceedings, entries in A&I databases, or even a link sent via e-mail from one colleague to another. Generally, in the scholarly community, reference linking is first and foremost thought of as a link among and between journal articles and bibliographic entries. In the electronic scholarly publishing community, linking initiatives have first attacked the obstacles and problems associated with linking among and between journal articles and bibliographic entries.

In the past, linking generally was built around the collections of a specific publisher (primary or secondary) or aggregator, whether internal or external. In other words, links either traveled among articles and records housed together or among articles and records housed on any number of servers, but the focus was the legitimate collection of a database provider. For example, publishers might provide internal links among their own electronic journals, or aggregators would provide internal links among full text provided by their services. In terms of reference linking, however, these sorts of closed systems are only partially successful. As Caplan reminds us, "It is very unlikely that any article published by Elsevier will cite only other articles in other Elsevier journals."² It is also unlikely that any one full-text repository will in itself be large enough for any one body of literature. Necessarily, then, a hybrid model of both internal and external linking emerges. While more useful, inclusive, and most importantly, open, hybrid models of internal and external linking introduce a new array of technical and organizational issues.³

Electronic Journal Publishing Defined
Before examining the technical and organizational issues involved in hybrid reference linking, however, we should look at the two general categories of scholarly electronic journal publishing: direct delivery from publishers and full-text aggregation through third parties.

With direct delivery, primary publishers offer electronic journal subscriptions via Web sites or Web services. For example, STM primary publishers such as Elsevier Science, Springer-Verlag, Academic Press, American Institute of Physics, and the American Chemical Society all offer access to their journals electronically. These primary publishers have services such as ScienceDirect (Elsevier Science), LINK (Springer-Verlag), IDEAL (Academic Press), Online Journal Publishing Service (American Institute of Physics), and ACS Publications (American Chemical Society). With some of these primary publishers, such as Springer-Verlag, a portion of the articles in various journals appear online before appearing in print.

The second general category of electronic journal publishing is full-text aggregation through third parties. In this schema, publishers hand materials over to aggregators, which then house the full text. Major electronic journal aggregators include EBSCO, OCLC FirstSearch via Electronic Collections Online (ECO), ProQuest, Gale Group via InfoTrac, and H.W. Wilson. However, unlike direct delivery, aggregators may not include all of any given journal issue and publishers may impose embargoes.

While this sort of categorization is useful, it is sometimes misleading. Increasingly, primary publishers have become aggregators. Elsevier and Springer-Verlag, for example, offer access to publications from sources neither company owns. To further complicate matters, aggregators may choose to house their full text on internal servers or link to full text housed elsewhere.

Moreover, primary publishers and aggregators now both offer direct delivery or e-journal gateways for libraries and information centers. Services such as EBSCO Online, OCLC's WebExpress, and SwetsnetNavigator attempt to serve as electronic journal portals for information centers or libraries. Other companies, such as the Gale Group and ingenta, have initiated partnerships that offer similar e-journal gateway services. These types of services, however, may not always accurately reflect a specific institution's holdings. For example, linking agreements and pricing structures determine which e-journals from different publishers are available and viewable through these subscription gateways.

With the ever-increasing amount of full text available in different formats and from different providers (e.g., publishers or aggregators, who can be one and the same), information professionals need a way to evaluate and navigate the technical and organizational obstacles associated with reference linking for scholarly electronic materials.

Linking Models: Internal, External, Or Hybrid?
First, one must understand the type of link provided. Links can be internal, contained within one service, or external, connecting documents or records provided by two or more services⁴. Internal linking occurs in aggregator services such as EBSCOhost's Academic Search Elite or Academic Search Premier, OCLC FirstSearch, Gale Group's InfoTrac, or ProQuest's Research Library. Primary publishers also employ internal linking in their own direct subscription services. For instance, reference links are available among journals published by Elsevier's ScienceDirect.

A prime example of the external linking model is linkages among secondary and primary publishers or links among abstracting and indexing services, primary publishers, and aggregators. Articles are housed on a different server than the bibliographic records. An A&I service functions as a navigational tool that then points users to the full text, via services such as SilverPlatter or Cambridge Scientific Abstracts (CSA). For example, CSA offers links from search results in its Internet Database Service (IDS) bibliographic databases to full-text documents offered by Project MUSE from Johns Hopkins University Press, PsycARTICLES from the American Psychological Association, and Ingenta. The IDS service includes more than 50 databases.

Lines of demarcation between internal and external linking continue to blur, however. Use of both internal and external linking, or hybrid reference linking, has become more and more common, as evidenced by the emergence of the CrossRef initiative, a joint effort of major primary publishers under the auspices of the Publishers International Linking Association (PILA). As Caplan noted, it is highly unlikely that any Elsevier-published journal article only references other Elsevier-published articles. To provide the service that readers and librarians want, primary publishers must link within their own services and out to other services. Aggregators and A&I services, too, have begun linking in and out of their own services, which often means crafting new linking agreements with primary publishers. While hybrid linking is becoming the norm, it does make for a most complicated practice.

Hybrid Linking in Action
In studying the hybrid linking model, it must be determined just where the link takes the user. Is the link internal to a service or external? Does a link take the user to the article or journal level or simply to the front page of a publisher's site? CrossRef [http://www.crossref.org], the international publisher initiative launched in 2000, allows for a dissection of the hybrid linking phenomenon among the primary publishers.

CrossRef describes itself as a "digital switchboard," linking the content of primary publishers (more than 91 at the end of 2001) and what CrossRef terms "affiliates" and "library affiliates." These links are effected through Digital Object Identifiers (DOIs), a unique identifier tagged to article metadata. Unlike URLs, which can be inconsistent and point only to a manifestation of an article or other piece of electronic content, DOIs are persistent and identify the object itself. DOIs link to URLs through a resolver system, such as the one run by the International DOI Foundation. CrossRef, then, functions as what Caplan calls a "reference database," into which CrossRef publisher members deposit DOIs and associated citation metadata. CrossRef houses no full text; it is only a cog in a wheel that allows for the association of persistent identifiers (DOIs) with locations (URLs) and article citation metadata.

One should note that while DOIs are the standardized identifier used by CrossRef, other standardized identifiers exist as well. The choice of identifier may depend on the organization or company using them and the level of access being supported. For example, an aggregator such as the Gale Group offers links to local holdings through ISSNs for its InfoTrac Web periodical products; the Gale Group is currently working on this functionality in its Resource Centers product as well.

CrossRef, however, has chosen DOIs. When a user clicks on a link in a reference list of a journal published by a participating publisher, he or she goes to the publisher's Web site, where access is determined by subscription. The reference list of an article in the Elsevier service ScienceDirect will have links to other Elsevier journals, as well as links to other publishers' journals, such as Blackwell Science, Springer-Verlag, and Wiley InterScience. Again, however, access is determined by subscription. While a user may have affiliated access via another route (an aggregator, library print holdings, etc.), currently, the CrossRef link only takes the user to the publisher-supplied full text.

Ovid [http://admin.ovid.com/openlinks] provides another example of the hybrid linking model, this time for aggregators. Ovid's products employ both internal and external linking through full-text aggregation, bibliographic databases, and its OpenLinks software. Links in these services may travel among documents housed at Ovid or housed elsewhere, on non-Ovid Web-based systems. With the OpenLinks software, connections can be set up between records in Ovid databases and remote e-journal full text to which an institution subscribes. OpenLinks also fully support CrossRef. According to Ovid, by supporting CrossRef, it has access to the CrossRef database. Access to the CrossRef database allows Ovid to pair its bibliographic article metadata to the information in the CrossRef database, thus creating a link from the article's unique DOI to the publisher-assigned URLs. Access for the user is wholly based on subscription, and according to Ovid, the institution can define OpenLinks so that links only appear for subscribed journals. Ovid also has agreements with other aggregators, such as Project MUSE, and primary publishers, such as Springer's Online LINK, to access by Ovid's OpenLinks.

Ovid is not the only company that has seen the advantage of supporting CrossRef. Secondary publishers, A&I services, and others can become affiliates in CrossRef and have access to the CrossRef reference database with its DOIs and journal article metadata. By becoming CrossRef affiliates, secondary publishers, A&I services, and others can bypass the tedious process of signing bilateral linking agreements with separate publishers. CrossRef affiliates include CSA, EBSCO Publishing, and SwetsBlackwell.

However, for now, using CrossRef alone means a system of one-to-one relationships between DOIs and URLs, pointing users to the manifestation of a journal article available at the journal publisher's Web site. CrossRef alone does not take into account user affiliations and whether users might have access to more than one route to a journal article, e.g., through an aggregator.

The "Appropriate Copy" Problem Rears Its Ugly Head
Once we understand the type of link being provided (internal or external), we must still look at several technical and organizational obstacles to understand the intrinsic complexity of reference linking. The major technical obstacle for information professionals is providing access to the "appropriate copy" for their constituencies. While the promises and advertisements of information providers (publishers, aggregators, A&I services) tout seamless interconnectivity, information professionals know better.

Behind the user's simple expectation of clicking on a link in a reference list and being instantly transported to its corresponding full text lie some very complex processes. The complexity lies in the multiple availability of any one article. For example, the full text of one article could be available through several means: the publisher's Web site; aggregator services such as Ovid, ProQuest, Gale Group, or EBSCO; subscription agent gateways such as EBSCO Online or SwetsnetNavigator; locally or consortially hosted copies of publishers' journal databases; and document delivery services such as Infotrieve or ingenta. And these are only some of the options for electronic versions. Of course, an information center might have print subscriptions or wish to direct users to ILL services. Moreover, an institution might have access to more than one of these article manifestations and wish to guide each user to the best option.

Open-system reference linking initiatives such as CrossRef and software developments such as Ovid's OpenLinks, as well as persistent and unique identifier technology such as the DOI, do much to move us toward a seamless interconnectivity between bibliographic data and full text housed within different organizations. Furthermore, efforts on the part of publishers (primary and secondary) and aggregators to link to local holdings or local ILL services also promote localization and personalization of resources.

We need one more step, however, to provide truly painless reference linking. We need linking architecture that never leaves a user at a dead end, wondering why he or she is denied access, a linking architecture that points users to their institutional-affiliated holdings, subscriptions, and chosen services. Such a linking architecture or system must provide context-sensitive reference linking.

Linking Initiatives and Technologies to the Rescue
Context-sensitive reference linking takes the user's information environment, their context or situation, into account when linking between references in journal articles or other online content to full-text collections. In other words, the linking systems include a localization feature that addresses the user's specific affiliation or possible subscriptions. Two 2001 articles by Oren Beit-Arie et al. and Priscilla Caplan, respectively, explain the need for linking systems to be open, generalized, and robust in order to be context-sensitive and to solve the "appropriate copy" problem⁵. For an effective "appropriate copy" solution, a linking system also needs standardization and localization.

Chemical Abstract Services' (CAS) ChemPort Connection and ChemPort Reference Linking services offer two examples of the type of services that attempt to address the "appropriate copy" issue, both in secondary database linking and in reference linking. The difference between secondary database linking and reference linking is a matter of nomenclature; often, secondary database linking is simply included in the notion of reference linking. These two services from CAS specifically address both.

The ChemPort Connection, CAS's initial linking effort launched in December 1997, allows searchers of CAS secondary databases (e.g., STN, SciFinder) to link from CAS records through the ChemPort Connection to the full text available from the primary publisher, patent office, or CAS's Document Detective Service. According to Harry Boyle, Manager, Web Alliances at CAS, some variation of reference linking has been available in CAS since the introduction of the ChemPort Connection, with links from CAS databases to the full text at publishers' Web sites; CAS has agreements with about 135 publishers, as well as with patent offices and EBSCO, as a subscription agent. Through these agreements and behind-the-scenes linking technology, ChemPort Connection allows those with subscriptions or affiliated access to link directly to the cited article or document. The ChemPort Connection also offers a link to local library holdings — and CAS works with a broad spectrum of libraries — that can extend from advanced, localized integration with library systems to a link that simply takes users to the library's home page. The advanced localization is a powerful feature of the ChemPort Connection, allowing local systems administrators to directly set up linking to their library through the CAS Site Administration Tool.

In December 2000, CAS announced linking from cited references in full-text articles to CAS records, thus allowing for links both to and from CAS records; this is the ChemPort Reference Linking service. Currently, Boyle noted, CAS is working with a relatively small number of publishers for the ChemPort Reference Linking service, including ACS Publications, Academic Press, American Institute of Physics, the Institute of Physics Publishing, the International Union of Crystallography, Springer-Verlag, and publishers with full text loaded at Catch Word. Announced December 4, 2001, CAS unveiled ChemPort's new "Enhanced Reference Linking Service." With these recent enhancements to ChemPort, researchers now have the option to view, for a charge, chemical substances discussed in the cited article or a list of documents citing the current document.

The OpenURL framework and SFX, context-sensitive reference linking software, combine to offer another option for successful linking of heterogeneous materials from different providers. Most important, the OpenURL framework is non-proprietary, an open source protocol and a proposed standard under review by the National Information Standards Organization [http://www.niso.org/committees/comittee_ax]. Herbert Von de Sompel and others developed the concept of OpenURLs, and the complete, original theoretical papers on the OpenURL were published in April 1999⁶.

According to Harry Boyle, while not called "OpenURL," CAS's linking initiatives designed to deal with the "appropriate copy" issue were a precursor to the OpenURL. Specifically, CAS collaborated with Ohio State University and OhioLink to localize the ChemPort Connection for the OhioLink consortium. Overall, the "appropriate copy" issue has been a recognized problem for years, and the CAS/OSU/OhioLink collaboration, those involved with the development of the OpenURL framework, as well as other groups, have been working to find a practical solution. (More information about the development of OpenURLs and the subsequent development of SFX is available at http://www.sfxit.com/.) According to NISO [http://www.niso.org], the OpenURL standard "should incorporate these two syntax options:

syntax for packaging metadata and identifiers describing information objects
syntax for pointing to a user-specific resolver that can accept this packaged data, combine it with user information, and resolve the data into actual links."

The standard should not focus on any one identifier, such as DOI, but rather take into account other identifier standards such as SICI, ISSN, and others.

SFX, on the other hand, is a dynamic linking software now marketed by Ex Libris [http://www.exlibris-usa.com]. In other words, OpenURL is part of the underlying framework that allows an SFX server to work. While marketed by Ex Libris, SFX is vendor-independent and facilitates an open-linking environment. It is remarkable in its ability to localize the dynamic creation of links among A&I databases, library catalogs, citations databases, citations in research papers, e-print archives, and Web resources. SFX is, in essence, a third party in the effort to connect the user with his or her "appropriate copy." An institution purchases it and it "remains under their control and management."⁷ This allowance for local administrative control is the most powerful feature of SFX. SFX, however, is only one of several possible local resolution systems. According to Caplan, systems from Endeavor Information Systems and OCLC's Open Name Service are in the works⁸.

SFX generally works on the concept of sources and targets. Sources could be records in one database, and targets could be records in another. For example, a user might access records in one database. A database record retrieved by the search would carry an SFX link (named SFX or something else). When the clicking on the link, the user would see a list of options specific to their affiliation: library catalog, full-text databases, e-journals, and more. Both the sources and targets must be OpenURL-compliant, as the requests passed between the two depend upon it.

Customer Demand for OpenURL Compliance
More and more publishers, aggregators, and vendors have either become OpenURL-compliant or have begun implementing such compliance. Again, as the SFX software (and hence the OpenURL framework) employs both sources and targets, a variety of organizations and companies provide products that are either sources, targets, or both. OpenURL-enabled resources include the pre-print archive at the Los Alamos National Laboratory [http://www.arXiv.org]; ProQuest from Bell & Howell Information and Learning; Cambridge Scientific Abstracts; EBSCOhost from EBSCO Publishing; InfoTrac from the Gale Group; WilsonWeb from H.W. Wilson; Web of Science from ISI; FirstSearch from OCLC; Ovid Bibliographic Databases and SilverPlatter ERL/WebSPIRS from Ovid; and SwetsnetNavigator from SwetsBlackwell. SFX targets are many more in number. There are bibliographic and A&I databases, document delivery services, journal publishers and individual journals, full-text aggregators, library catalogs, and general-interest Web sites. (For a full list of both sources and targets, see http://www.sfxit.com/.)

According to Gary Pollack, program director for Product Platforms at the Gale Group, Gale committed to becoming an OpenURL-enabled resource because customers started asking for the SFX product by name. Initially, there may have been some confusion among customers between OpenURL compliance and SFX, but regardless, the demand for the OpenURL/SFX solution was clear. Pollack noted that OpenURL is the emerging standard, and SFX is the best implementation of the emerging OpenURL standard. Gale is both a source and a target, but the real work for the information provider is becoming a target. Being an OpenURL-enabled resource involves sending outbound HTTP requests; therefore, information providers must re-gear their product to become an OpenURL-enabled resource, requiring a considerable amount of work. CAS also fully supports the OpenURL protocol, making the SFX software compatible with the ChemPort services.

Out of Many, One Linking System
All these many components of linking were recently put to the test. A complex linking system using the OpenURL framework, the DOI resolution system, and a local resolution system (including SFX) was tested in the spring and summer of 2001. Participants and observers of the prototype project included the following groups and organizations:

library participants (Research Library of the Los Alamos National Laboratory, University of Illinois Grainger Engineering Library, and the Ohio State University Libraries)
International DOI Foundation
Corporation for National Research Initiatives (technology provider for the DOI)
CrossRef
Ex Libris
OhioLINK
Digital Library Federation
NIS
Elsevier Science
American Institute of Physics

A September 2001 D-Lib article, "Linking to the Appropriate Copy," fully explains the prototype project⁹. Basically, however, all the following components were successfully used in the same linking system: DOIs as persistent identifiers; the OpenURL framework as a standardized transportation of metadata and/or identifiers; CrossRef as a reference database of DOIs and citation metadata; and SFX (and other systems) as options for a local resolution system. The most astounding aspect of the prototype project was getting such divergent groups to work together to create an effective linking system. It does engender hope for a truly heterogeneous research environment.

Growing Pains
All the organizational and technical concerns and obstacles associated with reference linking return us to the oldest goal of librarians: getting the right resource to the right user at the right time. The "appropriate copy" problem is not new, only updated to accommodate a digital world. Publishers, aggregators, and other information professionals have joined in the effort to create a truly seamless reference-linking environment. The recent collaboration to integrate OpenURL and CrossRef gives us hope that similar collaborations of such differing groups may be on the horizon.

Other sorts of library- and librarian-initiated efforts also give us hope. In an effort to deal with the "appropriate copy" issue and to add the crucial localization aspect to e-journal collections, products such as jake and SerialsSolutions were developed. Additionally, "advanced" thinkers such as those at CAS as well as those involved with the OpenURL development continue to tackle the technical issues associated with reference linking for libraries.

Remember that the Web-based electronic availability of full texts is relatively new. We only need look at the numbers. The number of publications listed in Fulltext Sources Online has grown from approximately 4,400 in 1993, to 13,094 in July 2000, and 15,388 in January 2001¹⁰. These numbers continue to climb as electronic publishing gains more and more momentum. Essentially, reference linking is in its adolescence and experiencing some growing pains. But reference linking also has the energy of an adolescent, allowing informational professionals to offer unprecedented access for our constituencies.

FOOTNOTES

1. Priscilla Caplan, "Reference Linking for Journal Articles: Promise, Progress, and Perils," Portal: Libraries and the Academy, vol. 1, no. 3, pp. 352-356.

2. Caplan, "Reference Linking."

3. Carol Tenopir, "Links and Bibliographic Databases," Library Journal, vol. 126, no. 4, March 1, 2001, pp. 34-36.

4. Jill E. Grogg and Carol Tenopir, "Linking to Full Text in Scholarly Journals: Here a Link, There a Link, Everywhere a Link," Searcher, vol. 8, no. 10, November/December 2000, pp. 36-45.

5. Oren Beit-Arie et al., "Linking to the Appropriate Copy: Report of a DOI-Based Prototype," D-Lib Magazine, vol. 7, no. 9, September 2001; Priscilla Caplan, "A Lesson in Linking," Library Journal NetConnect, Supplement to Library Journal and School Library Journal, Fall 2001, pp. 16-18.

6. These papers are available at http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt.1.html#ref1 and http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt2.html.

7. Jenny Walker, "Key Issues: SFX — The Context Sensitive Linking System for Libraries," Serials, vol. 14, no. 1, March 2001, pp. 71-72.

8. Caplan, "A Lesson in Linking."

9. Oren Beit-Arie et al., "Linking to the Appropriate Copy: Report of a DOI-Based Prototype," D-Lib Magazine, vol. 7, no. 9, September 2001.

10. Carol Tenopir, "Should We Cancel Print?," Library Journal, September 1, 1999, pp. 138-142.

Jill E. Grogg's e-mail address is jgrogg@library.msstate.edu.

Table of Contents

Previous Issues

Subscribe Now!

ITI Home