BUILDING DIGITAL LIBRARIES
HathiTrust’s Ascendance as a Web-Level Digital Library
by Terence K. Huwe
Director of Library and Information Resources
Institute for Research on Labor and Employment
University of California–Berkeley
|HathiTrust goes much further than the commercial goals of digitization and is quite likely rewriting the future for digital libraries.
[On Sept. 12, The Authors Guild, along with several authors’ groups outside the U.S., brought suit against HathiTrust and the universities of Michigan, California, Wisconsin, Indiana, and Cornell for copyright infringement with regard to their possession and use of 7 million digital book copies scanned for them by Google and for their recently announced policies for sharing and distributing “orphan works” from those collections. A similar action against Google itself has drug on for years in U.S. federal court. —Ed.]
Google’s ambitious digitization initiative has been followed both with fascination and concern ever since it was launched. Much of the controversy surrounded issues of ownership, copyright, and the risks of monopolization. But now there is an interesting second-wave initiative that has sprung from the mass digitization movement. It has grown so quickly and has been such a success that Google may need to move over on its pedestal and share the limelight. That initiative is HathiTrust, which goes much further than the commercial goals of digitization and is quite likely rewriting the future for digital libraries.
When it comes to mass digitization, what we need most are digital libraries that integrate the new artifacts with existing collections and make them easy to use. Academic research libraries—the folks who had the books in the first place—have been doing just that: forging alliances and collaborative ventures to make the most of the mass digitization movement. HathiTrust is one of the most important of these broad collaborative efforts. It is a genuine web-level digital library with a well-articulated vision, and it has excellent embedded tools that support reading and research. As such it is among the most important achievements in recent times to emerge from the academic world. Indeed, since a digital library (as we view it) is a collection of artifacts, functions, and services, HathiTrust may be viewed as a poster child for academic innovation, illuminating the power and potential of full-scale digital library operations at the network level. Now in its third year of operation, HathiTrust is continuing to evolve in ways that directly support teaching, research, and collaborative administration.
Read all about it in Heather Christenson’s recent article, “HathiTrust: A Research Library at Web Scale,” which covers the full story of HathiTrust from its inception through its present-day operations (see Library Resources & Technical Services, 55, No. 2, 93–102). For those of you who don’t want the whole story, here are a few historical highlights. The University of Michigan libraries decided to repurpose their MBooks initiative into something bigger and even better and launched the idea of the trust. Its first partners were the member libraries of the Committee on Institutional Cooperation (CIC). The University of California also joined forces early on, along with the University of Virginia, Columbia University, The New York Public Library, and Yale University. HathiTrust now has more than 50 partner libraries, making it a force to be reckoned with. Governance is shared among the institutions, strengthening each of their contributions by drawing from the wellsprings of professional skill that exist at each campus. The crown jewels of the collection are the books that have been mass digitized by Google, the Open Content Alliance, and the Internet Archive. In spring 2011, HathiTrust announced that it is now entrusted with more than 8.6 million digital volumes—the fruit of ingest streams from mass digitization projects that continue to deliver content.
There’s a whole lot that one could say about HathiTrust, given its scope and overall mission. I’d like to use this space in CIL to assess what I feel are the most significant successes that HathiTrust has enjoyed and what those successes tell us about the future of digital libraries.
Collaboration in the Spotlight
Library culture has a long history of genuine collaboration, which is best seen in our shared cataloging practices and coordinated collection development agreements. It is easy for us to overlook the fact that these practices are shockingly creative and innovative examples of smart work in the network era; as such they are being discovered and rediscovered by the business world, much to our benefit. As the internet makes it both possible and fashionable to collaborate, all of a sudden our long-term commitment is being viewed with greater interest by top administrators and business leaders. HathiTrust lifts our collaborative culture to the top network level, where it can be discovered once again. We should not underestimate the power of this high-profile step and the many ways it showcases the work of so many institutions.
We have reached two important milestones as our collaborative culture debuts at the network level. First, broad collaboration of this sort puts the imprimatur of every academic partner in plain sight, and this demonstrates to doubtful policymakers and even to citizens that libraries are a very good “buy” for the money. This incontrovertible evidence is a proof of concept and should become a key factor in promotion, fundraising, and outreach for all of the library partners. Second, when HathiTrust became a serious venture in 2008, the world did not yet fully know the power of microblogs, crowdsourcing, and other mass social interactions that define modern-day networking. HathiTrust’s innovative approach illustrates that a venerable smart mob—that would be us—channeled a tweet from Melvil Dewey from more than a hundred years ago and took it to heart. Our collaborative culture is now fully ready for prime time.
The Public Trust—At the Network Level
The debate about the future of copyright that accompanied Google’s mass digitization initiative has been transformational. Copyright issues entered into popular media as well as law school curricula. The balance between public and private ownership of knowledge resources is being debated, again and again. Throughout this decidedly rocky debate, there was one group of stakeholders that always had a consistent message: the information profession. We have advocated for the reader and for the concept of fair use … some would say quite tirelessly. Yet efforts to do away with fair use have not ended, and efforts to consolidate control over digital media continue unabated, as Apple’s play to retain 30% ebook revenue demonstrated during 2011. But as commercial players continue to duke it out, we have seen the maturation of a library-based policy voice that can now be heard both in the media and policy spheres.
HathiTrust is a pragmatic experiment in how to balance access with the needs of authors and publishers. It defines itself as a trust in the legal sense of the term. Digital materials within the collection are literally held in trust for the readers of the future. Materials that are still under copyright have a negotiated role that enables university library users to have access under their existing contractual parameters, which were carved out over time and therefore have a robust history of precedent. The great American library builders of the 19th and 20th centuries—the Carnegies, the universities, and the cities—sought to create a social institution for the greater good of society. Now that same sentiment is being carried forward in a digital sphere, without sacrificing our core values. HathiTrust’s high-profile mission makes it much more difficult to trivialize the role of digital libraries in education and society; for that reason it receives my vote for top public relations all-star.
Service: Done the Right Way
Although preservation of knowledge is a core goal of the trust, the potential of this vast repository to link with teaching and research activity is tantalizing. In May 2011, the California Digital Library announced that five of the University of California’s campuses have activated links to HathiTrust’s advanced services. The open-source Shibboleth program now allows library users at these campuses to obtain full downloads of digital artifacts, including the full text of books. With more than 8 million volumes, there are bound to be many ways that the library and the faculty can use HathiTrust within online learning environments, as well as through greatly enhanced e-reserves. Access to the full collection also creates text mining opportunities for researchers—a functionality that is heavily used in commercial data warehousing. These services are born digital and would appear to have minimal human mediation. However, they are the result of the combined brainpower of programmers, collection specialists, and faculty across all of the partner institutions. In a sense, HathiTrust is becoming the ultimate test bed: a fully functional digital library, operating at the network level, serving full-text content under existing licenses in the name of advancing learning. That’s a home run.
Where Do We Go From Here?
HathiTrust’s primary goals—preservation, common information architecture, effective search and retrieval, shared collection development, and the sustenance of the public good—lay a foundation for a fresh approach to research and discovery about digital libraries themselves. As user behavior is noted and studied, it would seem likely that “high touch” functions, such as online reference and the ability to comment or recommend, are logical next steps for the trust. Likewise, HathiTrust’s architecture dovetails with the growing push to curate the full lifespan of academic content creation, preserving prepublication data for the future. For example, at the University of California, digital curation of all of the materials that are gathered during the research process is now officially supported through the UC Curation Center. The Center partners with HathiTrust to make sure that these valuable artifacts of learning will also be preserved for the public good.
It is important to restate that all of this activity is library-driven and library-sponsored—and with a digital collection of this size and impact, the world will be watching.