Resource Access for the 21st Century: A New Standard’s Rocky Introduction

A TROUBLED ROLL-OUT: LIBRARIANS QUESTION THE NEW STANDARD

Introducing RA21 in a well-read Scholarly Kitchen blog, NISO executive director Todd Carpenter and his co-authors, Elsevier’s Chris Shillum (RA21 co-chair) and Spherical Cow Consulting’s Heather Flanagan, note: “RA21 aims to solve these problems once and for all, by promoting a modern, standards-based access management system” (scholarlykitchen.sspnet.org/2018/02/07/myth-busting-five-commonly-held-misconceptions-ra21). Having spent years as a consultant, one thing has always been clear to me—technology is neutral, not “bad” or “good,” and only as useful as the available technology and clear design let it be. Technology also changes constantly, making few solutions able to solve problems “once and for all.” However, in recent years as the web has grown, the potential and cost of misuse have grown massively.

Lisa Hinchliffe, professor/coordinator for information literacy services and instruction at the University of Illinois–Urbana-Champaign Library, was an early critic of RA21, stating in a January 2018 Scholarly Kitchen post (scholarlykitchen.sspnet.org/2018/01/16/what-will-you-do-when-they-come-for-your-proxy-server-ra21), “I think we should be cautious about conflating a ‘today’ problem with a specific ‘today’ solution. The problem commonly agreed is stumbling blocks in accessing content from outside of one’s campus IP range. One solution is SAML-based approaches. Another would be OpenID. Another would be for publishers/platforms to just point back to the proxy servers at institutions that enable IP based authentication! Who was involved in picking the solution that RA21 pursued? Again, let’s remember that the strategy was chosen by STM—not the full stakeholder community. That’s where the trust was lost.” Since this time, Hinchcliffe has joined in some of the RA21 groups, studying specific strategies and options; however, her concern has been voiced by many in the field who still feel this is unneeded or an intrusion on privacy.

Stephen Downes offers his own commentary to Hinchliffe’s post, noting, “the premise is that IP-based access to paywalled scholarly publications is coming to an end and will be replaced by (something like) RA21, which is an identity federation. This makes sense because the current system for accessing publications—even ones your institution has paid for—is irredeemably broken. But the cost is pervasive tracking and data collection. And the intent is probably to end anonymous access as a counter to services like Sci-Hub. It will also create a greater burden on smaller publishers, which works perfectly for the major players” (www.downes.ca/cgi-bin/page.cgi?post=67675).

In a July 2018 blog posting, Cleveland Clinic librarian Michelle Kraft describes major problems in implementing RA21 in medical institutions because of the nature of in formation access and use. “I think there is a donut hole for medical information. There are doctors, nurses, researchers who are affiliated with an institution (but not officially part of the institution) or they are private practice who have privileges but are not employed by the hospital. These people often fall in the donut hole for access to medical information” (kraftylibrarian.com/2018/07). Kraft goes on to note that many medical organizations do not have the infrastructure to support access. In an earlier (April 2018) post, Kraft notes, “There are A LOT of hospital libraries who can barely afford their journals let alone OpenAthens or another product to management online access. … Libraries with walk up access via their computers will have to figure out how to time out people. The doctor is not going to logoff of a journal when they leave”(kraftylibrarian.com/2018/04).

Hal Bright, electronic resources librarian at A.T. Still University, makes this comment about Kraft’s posts: “Who is going to use two factor authentication for library resources when they complain about a user name and password already. RA21, for me, has privacy FERPA [Family Educational Rights and Privacy Act of 1974] and HIPAA [Health Insurance Portability and Accountability Act of 1996] concerns written all over it if the user can be tracked by anyone other than the person with access to our proxy logs” (kraftylibrarian.com/medlibs-needs-ra21-on-their-radar/comment-page-1/#comment-101687).

PRIVACY

RA21 supporters stress that the standard will improve the user experience by enabling the user to work from any device or location and will provide a more consistent user interface. The standard would provide “greater privacy, security and personalization.” In the pilot programs, institutions have been working to “create a set of best practice recommendations for identity discovery” and were testing compliance with the EU’s General Data Protection Regulation (GDPR) requirements set earlier this year. However, they have clearly not convinced the majority of information professionals who still feel great concern (slideplayer.com/slide/13664073).

The American Library Association’s Code of Ethics (tinyurl.com/y6v89zx3) is explicit in support for the freedom to read:

We significantly influence or control the selection, organization, preservation, and dissemination of information. In a political system grounded in an informed citizenry, we are members of a profession explicitly committed to intellectual freedom and the freedom of access to information. We have a special obligation to ensure the free flow of information and ideas to present and future generations. …
We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted. We respect intellectual property rights and advocate balance between the interests of information users and rights holders.

Singapore Management University librarian and blogger Aaron Tay (musingsaboutlibrarianship.blogspot.com) is also concerned about the future of privacy with RA21, emailing me, “It’s hard to take at face value that publishers aren’t interested in tracking users and it’s all about seamless access.”

RA21 isn’t the only concern raised recently when it comes to privacy. Eric Hellman, in a recent blog posting (go-to-hellman.blogspot.com/2017/03), asks:

Ever hear of Grapeshot, Eloqua, Moat, Hubspot, Krux, or Sizmek? Probably not. Maybe you’ve heard of Doubleclick, AppNexus, Adsense or Addthis? Certainly, you’ve heard of Google, which owns Doubleclick and Adsense. If you read scientific journal articles on publisher websites, these companies that you’ve never heard of will track and log your reading habits and try to figure out how to get you to click on ads, not just at the publisher websites but also at websites like Breitbart.com and the Huffington Post.

Hellman goes on to describe examples of his 2017 study:

Science, which could be loaded securely 2 years ago, has reverted to insecure connections. The two Annual Reviews journals I looked at, which were among the few that did not expose users to advertising network tracking, now have trackers for AddThis and Doubleclick. The New England Journal of Medicine , which deployed the most intense reader tracking of the 20, is now even more intense, with 19 trackers on a web page that had ‘only’ 14 trackers two years ago. A page from Elsevier’s Cell went from 9 to 16 trackers.

Most libraries are still using the models of privacy that existed before the internet. By “buying” a journal in print, it resided onsite and there was no reasonable way for anyone to clock who was using what. Today, with everything online, we are entering a very different reality. For academic and public libraries, this is an area needing some review of privacy standards.

IN SUPPORT OF THE PROXY SERVER

According to University of Minnesota director of web development Cody Hanson, the proxy server is a firewall for identity. In an email, he remarks, “Ultimately there must be some way to represent affiliation and that specific users have the right to access some information and that authorization requires identity. The proxy server allows libraries to vouch for each access while guaranteeing the anonymity of users beyond that institution.” He adds, “My identity is further protected by the very aggregation of individuals by passing along only a proxy server indicating that the user has met requirements to use that resource. Further, by aggregating users as a proxy server identification, individual identity is further protected. By controlling the servers at the institutional level, the logs of individual usage, and the identity and activities of those users, is also protected.”

Security of information is protected at the local level by using Shibboleth, a single sign-on system which is able to quickly identify HIPAA, FERPA, and other data compromises, Hanson explains. Proxy servers give us a single point in which to monitor compromised accounts or other issues, making it much easier to detect problems or violations (excessive downloads, etc.) in a single point of observation. The proxy server is very useful in verifying abuse claims; proxy laws provide the forensic data malfeasance or merely overeager users. This enables us to analyze usage by college or other metrics across all types of resources, providing usage data to justify our budgets and inform our decision making.

As for RA21, Hanson finds it concerning. “I fear that it will put user privacy at risk, limit libraries’ ability to gather and audit independent usage data, and, ironically, make it more difficult for libraries to identify compromised accounts. While I think that NISO was initially taken by surprise that RA21 was met with such a critical reception in libraries, I really appreciate the efforts they have undertaken to gather input this year.

“One of the uncomfortable aspects of RA21 is that libraries are concerned about publishers’ ability to snoop on user activity because it likely precludes the library’s ability to do some of that very same snooping,” Hanson continues. “We in libraries like to think of ourselves as benevolent caretakers of user data, but it remains true that the best way to protect data is to not have it. Truly protecting user privacy may mean libraries giving up their ability to gather the kind of detailed usage statistics that they’ve come to rely on for purchasing decisions and for initiatives like UMN’s own Library Data and Student Success initiative.”

PERSPECTIVES FROM RA21 LEADERS

R A21 co-chairs, Shillum and American Chemical Society’s Ralph Youngen, offer important information on RA21 and the process that is being used to hone, explore, and implement the final standard. Youngen explains to user s that “RA21 is trying to greatly simplify the process at the point of access right at the publisher site. RA21 is seeking to make that experience as simple as a single click regardless of where you are or what kind of device you are using” (alcts.ala.org/news/2018/21st-century-resource-access).

According to Youngen, “RA21 is building on the SAML protocol, which is an open standard for exchanging authentication and authorization between parties. Shibboleth is one implementation of the SAML standard. Furthermore, SAML is a privacy-preserving protocol, because it allows a user’s home institution to be in control of the information that is shared with third parties (such as publishers). RA21,” he maintains, “doesn’t change the fundamental nature of the way that SAML works, the way that the data is exchanged between the content provider and the subscriber. In no way are we changing that level of the ecosystem at all. Assertions that there is a lot more going on behind the scenes in terms of private information are not factual. That is truly not the case with RA21.”

Shillum understands the complexity that this standard represents: “We are very aware of the concerns about privacy and access in the library community. I understand the worries and the concern about different types of libraries’ needs and the sending of personal information over the web, but often these concerns are not founded on the facts. I understand the concern and issues librarians are bringing up, however, these concerns are absolutely unfounded. Over the past few years, we have been working very hard to try to dispel some of these concerns and falsehoods.

“ We can walk you through these two areas of concern one-by-one,” Shillum continues. “In the case of access through your campus IDP (Identity Provider) to some scholarly information resource, what we are proposing would happen is that you would land on one of the content sites, and you would tell us that you are from the University of Minnesota. If it were an Elsevier product, we would use InCommon metadata and Shibboleth software to send you to your campus IDP to sign in.” The next step, he notes, is the key one: “The information sent to us by the campus IDP, what we call SAML assertions, tells us that this is a bona fide user by sending an attribute, called Library Common Terms. This attribute says that this is a person that is able to access resources that have been licensed by the institution. And that’s it, period.”

Shillum has assured me that the attribute doesn’t tell the provider who the user is or provide any personal information. “The IDP may also provide another attribute which is unique to that individual user but opaque, [but] it does not provide any personal information. This attribute is called eduPerson Targeted ID, and allows the user to sign up for an individual account on the providers site and have that linked to their campus sign-in credentials.” Shillum says this attribute is optional and at the discretion of individual users wanting some advanced service from the provider, such as alerts. “This basic mechanism for applying Shibboleth standard to library resources was agreed ten years ago among the identity management community, publishers and libraries. RA21 is proposing to make use of this, rather than change it. The exchange of the library common terms attribute, this anonymous indicator,” Shillum explains, “is functionally equivalent to an IP address, and tells us only that the user is an authorized member of that library community, not who they are—and that’s it.”

CREATING A STANDARD THAT WILL MEET COMPETING NEEDS

Publisher content licensing requires some form of authorization for people using/accessing these materials, and this process requires that some form of identification be accompanied with each request, thereby guaranteeing that the terms of use are met. Utilizing SAML and Internet2 Shibboleth authentication technologies, RA21 is intended to simplify the pathways to restricted-access content. The details are highly technical, far more so than those understood by information professionals; however, our voices are critical in the development of RA21 and future standards.

In recent years, largely due to the growth of the web, the potential and cost of misuse have expanded massively. For information professionals, the issue of privacy of information and the need to protect certain categories of information are core to our work. The development of RA21 is still in the early stages. Thus, now is the time to seek an understanding of the higher-level ideals and needs of various libraries and user privacy issues before pushing forward with a standard with such strong corporate support, complex implementation, and lack of a strong base of privacy standards to guarantee individual privacy concerns.

In webinars, presentations, and the previously mentioned blog posting, Carpenter, Shillum, and Flanagan note that adopting RA21 as a standard today will “solve these problems once and for all.” However, in an an age with never-ending technological change and development, even if the privacy issues and other logistics are resolved, I believe that RA21 will not resolve these issues for all time. Years ago, that’s what was said about CDs, CD-ROM, MPEG, and so many other, now-outdated, technologies and standards.

In reality, RA21 is not yet a set standard. The technologies involved represent software that offers new functionality over past technologies. A variety of pilots (ra21.org/index.php/pilot-programs/how-to-participate) are currently underway—with the stated goal to “identify best practices for adopting federated identity in order to streamline the user experience for access to subscribed content outside institutional IP domains”—and others are being sought for the future that will further explore issues and options for ac cess, meeting the needs of users, content-providers, and institutions. The Security & Privacy Working Group includes publishers, standards officials, and librarians—including Hinchliffe. In its July 2018 report (ra21.org/wp-content/uploads/2018/07/RA21-Security-Privacy-Final-Report.pdf), the privacy group found that “all security threats identified were deemed low priority and risks can be mitigated by applying standard security and data protection practices.”

The RA21 Academic Pilot Technical Evaluation group’s analysis notes that its “pilots were successful in testing technical approaches to identity provider persistence, and we learned a great deal from both. Now that we have determined that taking RA21 forward will require the establishment and operation of at least some centralized infrastructure, we want to focus on just one option” (ra21.org/wp-content/uploads/2018/07/RA21-Academic-Pilot-Evaluation.pdf). This open approach to studying the implementation and implications of this proposed standard is refreshing. In my conversations with Shillum and Youngen, they have strongly encouraged the participation of libraries and librarians in these preliminary efforts to study and compare various ideas and approaches to see that the best practices and recommendations meet the needs of all of the key players.

RA21 AND PRIVACY

Shillum sees GDPR as “a major milestone in user privacy. If you are looking for organizations that can be trusted, look to Europe because of this new stringent privacy standard.” He adds, “Scholarly publishers live in a global community so the GDPR is becoming a global standard for us. Corporations can be fined up to 4% of their revenue for failure to comply.” While GDPR doesn’t mean that companies will never gather some type of personal data, it requires this to be done in an open and transparent way. Shillum shares the process for Elsevier: “We have Mendeley, which asks users to register and provides them with free reference and profile management tools. Whenever we ask for personal data, we do this in a very open and transparent way. If someone asks to have that data removed, according to GDPR, we must do that.” To him, this is a further guarantee of trust. “If I had to trust a scholarly publisher, the US government or one of the internet giants, I’d be much more willing to trust the publishers.”

Speaking at ALA’s 2018 Annual Conference, Peter Brantley, University of California–San Diego director of online strategy, presented information on RA21 as a way to improve the user experience in a privacy-protecting way, using the pilot technology developments to demonstrate how the “resulting efforts have been built around the library community values of protecting privacy.”

Youngen agrees, noting, “What is so perplexing is the focus on privacy issues with RA21. Google, [which] created CASA as a solution to off-campus access issues, is in the business to take our private information and try to monetize this, yet there has been very little discussion of the implications of their approach, maybe because they have not been open and transparent about what they are doing behind the scenes. When publishers collaborate to try to solve the same problem, we hear the uproar (perhaps because we have been much more open and transparent), yet Google gets a pass.”

Paul Pedley, privacy expert and author of Essential Law for Information Professionals (Facet, 2012), has compared the profession’s interest and efforts with copyright to that of patron privacy, finding that, in library professional literature, “copyright was covered 4.5 times more often than privacy” (archive.cilip.org.uk/blog/privacy-library-user).

In interacting with the 70-plus professionals I interviewed, I found that most had no real experience or information on RA21. Instead, many expressed serious concern based on the limited amount of information that has been presented. Given our current political situation, and the ongoing, serious privacy and security threats, this is reasonable concern. However, as a profession, we need to participate.

Nancy K.Herther is a research consultant and writer who recently retired from a 30-year career in academic libraries.