THE OPINION PIECE
Digital Changes Everything: The Intersection of Libraries and Archives
by Jan Zastrow
|With our collective expertise, we can lead the way in managing the astounding array of digital resources we currently face and those we cannot yet imagine.
[This is a guest column by an industry leader about tech topics of importance to
libraries. Authors write here only at the invitation of the editors. In this edition,
we hear about digital libraries from the perspective of an archivist. —Ed.]
Over the course of my professional career—2.5 decades—I’ve seen a modicum of interest by librarians in my specialty field of archives, mostly related to preservation and environmental conditions in my tropical home state of Hawaii. In fact, I started out as a special collections librarian, and it’s been at least a dozen years that I’ve been trying to interest folks in what we now call digital archiving.
But it wasn’t until the prevalence of digital publishing and electronic records that real interest percolated among librarians. The avalanche of born-digital records, digitization projects, digital curation in special collections, and new fields such as digital humanities has finally brought about an intersection of these various disciplines. Let’s look at how they’re similar and where they differ.
Convergence—How Libraries and Archives Are Alike
No doubt about it, information professionals—librarians, knowledge managers, records managers, archivists, IT personnel, and museum and data curation staff—are excited and challenged by the digital information landscape. The holy grail of ubiquitous, seamless access to information in any format is a common goal unifying digital collection communities. It’s no surprise that one of the latest strategic goals of the National Archives and Records Administration is to provide prompt, easy, and secure access to its holdings anywhere, anytime. An admirable but very ambitious goal, and it won’t happen overnight.
Consortial collaboration on best practices for displaying, cataloging, contextualizing, and managing virtual objects promises rich rewards. The possibility of bringing together collections from across the globe by way of online exhibits—some of which could never conjoin in the physical world—means new combinations of text, images, and ideas to inspire research as well as the ability to conduct different forms of data mining and analysis. And, in the case of physical loss due to natural or man-made disasters, the possibility of locating digital copies previously scanned, photographed, or input by users could reclaim otherwise lost heritage collections (in fact, this occurred after the collapse of the Cologne, Germany, city archives in March 2009). Once again, the recurring principle of LOCKSS—Lots Of Copies Keep Stuff Safe—stands the test of time.
Despite these exhilarating potentialities, we all know colleagues who jealously guard their information turf and cling to outdated paradigms inappropriate to and increasingly irrelevant in digital environments. Traditional professional assumptions render it difficult to cope with new record-creating realities in a digital world.
This is not to say the concerns are completely unfounded. In today’s downsized budgetary environment, changing and converging skill sets bring unprecedented challenges to professionals already doing the work of two, three, or more missing staff members.
Digital librarians are taking high-resolution images for long-term preservation; digital archivists are creating metadata for individual digital objects; digital practitioners (subject specialists or humanities scholars) are serving as project managers for in-house digitization projects; and all of them are sifting through gigabytes of born-digital records. Even while budgets and staffs are shrinking, these are additive responsibilities: Librarians and archivists are now required to manage both the digital surrogates and the original physical materials. Administrators are understandably confused as to why these professionals need separate job descriptions, credentials, and professional development … not to mention justifying multiple positions to funding sources.
Expectations on the Rise
Managing administrators’ expectations requires some savvy too. Digital is sexy; it sounds good to benefactors, and it looks impressive in the annual report. They want it digitized yesterday, never mind if a collection is already neatly organized or in shambles. Donors, too, expect that we’ll digitize everything as soon as it comes through the door.
Often the misconception that management and preservation of e-records will be cheaper and easier leaves archivists sputtering to explain the opposite: that digital formats are in fact more fragile and susceptible to loss due to changes in hardware, software, human error or neglect, and the all-on-or-off toggle effect unheard of in the analog environment. (Warning: Archivists will always argue to keep the original even after it’s been digitized!)
Managing researchers’ expectations that everything is available digitally can be challenging as well. Although rarely discussed in the professional literature anymore, easy access to full-text online sources has created a kind of McLibrary. Most undergraduates required to write a paper citing three sources automatically turn to the internet to look for articles and possibly a vetted library database if mandatory. The older or arcane works in journals too small or specialized to provide full-text access are as good as invisible. Certainly, more experienced scholars may look beyond digital, but even in archives we’re starting to see researchers who don’t have the time, travel funds, or inclination to seek out materials they can’t get online. It’s too late to turn back the clock—the digital future is already here—but why not train the next generation to analyze all types of information resources, whether hard copy, born-digital, digitized, 3D, image-based, immersive, or oral?
In both libraries and archives, new technical skills are required to create and curate these digital collections. Whether studying programming languages to customize an open source platform, learning new metadata schemas, or figuring out how to manipulate and present XML data, ongoing technical training is our current reality. Data science should certainly be part of future information management curricula, but gaining these 21st-century skills on the job requires staff time, budgetary resources, and lost productivity at the very least, and maybe even dedicated full-time positions if we’re lucky.
The use of crowdsourcing, popular with libraries for several years now, is becoming more common in archives too, particularly for identifying images and transcribing oral histories and handwritten documents. And while archivists don’t catalog individual items as a matter of course, item-level description does greatly improve access and searchability. One of the biggest challenges for digital professionals right now is finding the balance between providing enough description to make an item findable but not so much that a job gets bogged down in time-consuming quantities of metadata. For mass digitization projects, series-level or folder-level description may be the best we can hope for.
By the same token, archives and libraries are struggling for best practices to preserve Web 2.0 and social media as evidence of social, civil, business, interpersonal, and political interactions. In academia, it is often the library’s special collections department that takes the lead on maintaining preservation copies of university websites, online course catalogs, and even web-based local heritage collections and community newspapers.
In the archives world, appraisal—the process of assessing what has permanent historical value—is taken very seriously, and it used to mean going through boxes and folders one at a time. But the sheer quantity of born-digital and digitized records makes the conscious process of appraisal an unscalable, almost superhuman chore. Some PIM (personal information managers) and IT professionals don’t believe in appraisal at all: “Keep everything, storage is cheap!” I anticipate some industrial-strength search engines will be able to sift through petabytes of data in the near future, but, as information professionals, we’re still trained to prefer mediated, managed, and contexualized collections.
Even deciding what becomes digital has long-term ramifications. Everything digitized has to be described and assigned metadata, maintained on a server, provided with backups, and made accessible and searchable (often on multiple platforms). The initial investment is so great that the plan is usually to maintain it indefinitely. Yet soon cultural heritage institutions may have to start triaging even their digitized collections due to the intensive cost of their upkeep and maintenance.
But some things can’t be weeded, at least not anytime soon. National Science Foundation (NSF) grant projects, for instance, require applicants to submit a data management plan on how the resulting datasets will be archived and made publically accessible.That means institutional commitment to annual data checks, migration to new hardware and/or software every 3 to 5 years, and the constant care and feeding of legacy digital objects while more new electronic resources are being created.
Another shared concern is the preservation issue associated with impermanent storage media and ephemeral file formats. As storage gets cheaper, the guaranteed life span of storage media gets shorter. These frequent and cataclysmic changeovers create an urgency to identify and preserve permanent electronic records almost “at birth” in order to ensure their longevity. No more waiting around in dusty archives for someone to retire and donate her boxes; today’s archivists—and digital librarians—need to be on the team and part of the project, if for no other reason than to manage and preserve that soon-to-be-historical data.
Divergence—How Archives and Libraries Are Still Different
Despite the many similarities, there are some distinct differences that still set these disciplines apart. Archival materials are by definition primary sources, the original documentary evidence of human juridical, administrative, and social activity, which are unique and irreplaceable. While libraries also hold one-of-a-kind dissertations and manuscript collections, they still concentrate on published secondary sources—monographs, journals, newspapers, government documents, proceedings, and the like. The digitization of such items requires careful selection, planning, and consideration, but preservation into the conceivable future can often be shared consortially among other institutions that hold the same or similar materials. On the other hand, permanent retention of primary, irreplaceable sources requires commitment of institutional resources into the unforeseeable future, particularly if the original format is destroyed after digitization to save space.
Archives are defined and organized by provenance, i.e., by the source and creator of the record—whether a government agency, individual, business, or organization—rather than by subject area typical in libraries. They must be arranged and described (i.e., processed) to be made ready for research access (preferably before digitization). Gaining intellectual control over such diverse collections is paramount and includes resolving thorny legal issues of copyright, privacy, and other use restrictions.
These complex collections contain multiple formats: government records, personal papers, correspondence, photographs, scrapbooks, audiovisual materials, memorabilia, and nowadays a hard drive full of spreadsheets, documents, datasets, and email. Like pieces of a puzzle, archivists research the connections between the many components, then arrange them into series and describe them in a user guide called a finding aid. Because archives patrons can’t browse the stacks and help themselves as they would in libraries, consulting with the archivist leads to much better results than depending on the minimal description in a finding aid, inventory, or collection overview.
Most granting agencies now require the use of EAD (Encoded Archival Description) to provide cross-searchability in online finding aids—similar to library MARC-encoded catalog records, OCLC, and other bibliographic tools that attempt to provide universal access. This places pressure on small and understaffed archives (which is to say most) but does provide more consistent finding aids that are easier for researchers and staff to use. Tools such as ArchiveGrid can help users locate collections they might not have found otherwise, but it would be foolhardy to assume there is any kind of WorldCat of archival collections, certainly none wholly digitized and accessible from the solitude of someone’s office or home computer. The idiosyncratic and contextualized world of archives necessitates communication with the archivist. Because archival collections are not composed of discrete, individually cataloged items, disintermediation does a disservice to our scholars. I would hate to see the digital future made poorer by our shortsighted enchantment with all things technology.
The Future: Convergence and Churn, er, Collaboration
Way back in the last millennium, I was predicting convergence of communication services and devices. Yawn. What was once a cutting-edge idea, the wave of the future, is now an everyday, every minute occurrence on our smartphones, iPods, and iPads, etc. Here I’m talking convergence of a difference sort: archives, records management, special collections, preservation, digital curation, and more. And in a dozen years or less, this will be a no-brainer too. Digital materials require more proactive management and preservation than paper records. This provides us the opportunity to go from passive custodian to active intervention. Let’s jump on the bandwagon sooner rather than later! With our collective expertise, we can lead the way in managing the astounding array of digital resources we currently face and those we cannot yet imagine. By respecting the differences and special expertise of each of our disciplines and finding points of collaboration and cooperation, we can develop new and innovative ways to better serve our researchers whether they live down the street or on the other side of the world.
Breeding, Marshall, “Digital Archiving in the Age of Cloud Computing,” Computers in Libraries, March 2013, pp. 22–26.
Cary, Amy Cooper, ed., Book Reviews, The American Archivist, Vol. 75(2). Chicago: Society of American Archivists, fall/winter 2012, pp. 567–587.
Conversation with Rand Jimerson, past president of the Society of American Archivists, New Orleans, Aug. 18, 2013.
Cook, Terry, “The Archive(s) Is a Foreign Country: Historians, Archivists, and the Changing Archival Landscape.” The American Archivist, Vol. 74(2). Chicago: Society of American Archivists, fall/winter 2011, pp. 600–632.
Focus group conducted with Dumbarton Oaks librarians and archivists, Washington, D.C., July 12, 2013.
Zhang, Jane and Dayne Mauney, “When Archival Description Meets Digital Object Metadata: A Typological Study of Digital Archival Representation.” The American Archivist, Vol. 76(1). Chicago: Society of American Archivists, spring/summer 2013, pp. 174–195.