Computers in Libraries
Vol. 20, No. 2 • February 2000
Digital Preservation: 
Everything New Is Old Again
by Andrew K. Pace

"... nearly every library in existence is doing something with digital collections ..."
First things first: I would be remiss if I did not begin by introducing myself and also thanking the people responsible for providing such a wonderful forum in which to express my opinions about libraries, automation, and the future of the information profession. I am a systems librarian at North Carolina State University Libraries in Raleigh. I came to the world of libraries through a love of books and knowledge, and to systems completely by accident. To mix a few metaphors, I got my feet wet at the Catholic University of America Libraries, where I got my Master’s, earned my stripes with Innovative Interfaces in California, and found my niche here in North Carolina.

Special thanks to Eric Morgan, who first told me of the impending vacancy in the pages of CIL, and who leaves some big shoes to fill. Fortunately, I can thank and consult him as often as I like since he and I work together. And thanks to the editors at CIL, not only for extending this opportunity to me, but also for readily accepting what presented itself as a difficult theme for this column—that library issues and trends are circular in nature, and that we find ourselves in situations where past (and current) decisions play an important role in our future development.

Now, to the Matter at Hand
This month’s theme of Archiving and Preservation excited me as a new columnist—I concentrated in the book arts in library school, which, naturally, led me to systems; however, the breadth of coverage on the subject amazed me as I refreshed my research. I want to use this opportunity to address some seemingly obvious assumptions, present and evaluate the most common forms of digital preservation, and, of course, raise more questions than I answer.

I will start with the assumption that nearly every library in existence is doing something with digital collections, but that relatively few have thought about how to preserve—both access method and artifact—the collections they are building. Secondly, I will not challenge the generally accepted notion that the life spans of digital storage media, application software, and required hardware are growing shorter with each passing Roman numeral of the Intel chip. All those state-of-the-art machines, software packages, and compression techniques seem old before the boxes and shrink-wrap even hit the landfill. And just as quickly, we are faced with re-evaluating the technology that was supposed to solve all of our problems—now there’s a circle we could have all done without. Finally, I think we can all agree to the ultimate importance of ensuring the longevity of special collections in general and digital collections in particular.

Where We Stand Now
If digital libraries can be said to have a tradition, we might call it the tradition of the “accessible repository.” Building digital collections and designing effective access to them have been such a focus that preserving the longevity of digital objects and digital interfaces has too often become an afterthought. Meanwhile, our capacity to store keeps increasing, while the longevity of the media—and its hardware—decreases. This is reminiscent of 19th-century book production, where a more literate populace placed a demand on publishers that was far greater than their concern for the legacy of acidic deterioration that they unwittingly passed onto libraries. Luckily for us, there are more librarians and technologists concerned with “bit integrity” now than there were wood and paper scientists (with a solution) then. When libraries take digital projects into digital production, preservation should enter the equation early and remain a strong consideration always.

Preservation Strategies, or ‘Raspberry Preserves’
It seems hard to analyze the existing digital preservation options without giving them all a big wet raspberry. Perhaps this is why no single strategy has presented itself as the clear front-runner. So we have to do something, but what? In most cases, we have already selected the collections that we wanted in digital form; we’ve done usability testing and made sure the service would scale appropriately with its use. Without even picking a new strategy, we already need to make the fiscal commitment (both staff and ongoing infrastructure). How can we be sure that the preservation strategy we choose will work, and will we need to start over again when we’re done and another option has presented itself? I am not going to attempt to answer all of these questions, but I will present the five major digital preservation strategies. I will also add a parallel strategy that some fear is too often overlooked, and I will try to find some common ground on which we can all stand.

Refreshing. The first—and probably most common—alternative is refreshing. Think of the verb, not the adjective, lest we think this first option is new and insightful. Refreshing involves transferring digital materials to a new medium, for instance, changing from 5 1/4-inch floppies to CD-ROM, or from CD-ROM to DVD.

Migration. The second common activity is migrating to a new format. If you were at my university right now, you might find yourself somewhat adept at saving that WordPerfect 7 document as a new Word 97 file.

Both of the examples I’ve given so far are somewhat simplistic. Let’s imagine a more creative one. What would you do if one of your Mac OS9 FoxPro users wanted to review the historical data that your predecessor created in dBASE III on an IBM286 which is “archived” on a 5 1/4-inch double-sided floppy? The common answer: “How badly do you really need that data?” This case would require a great deal of refreshing and migration to ensure that the request could be met again in the future. This is very labor intensive, but hardly unique to our profession. Recall the ongoing efforts at Preservation Microfilming—resources being migrated to a new format, which is itself continuously refreshed as the technology improves. We stand to learn a lot from the successes and failures of those efforts.

Refreshing and migration raise another important preservation issue. Libraries must consider whether to treat digital materials as artifacts or simply as intellectual content. If I save my column in ASCII text (better make that Unicode!) on a good floppy, and faithfully refresh that medium as time goes by, I have preserved the intellectual content. This model fits my personal preservation and archival mission. Now, the equally faithful CIL editors would undoubtedly have a different mission. Their archival copy might preserve layout, graphics, varied type fonts, and (although I can’t possibly imagine why) edits. Ironically, when CIL and I sit down in our separate worlds to consider access to this content, we might create identical catalog records with identical metadata descriptors—but more on that problem later.

Technology Preservation. This option can only be described as untenable. I still keep a Commodore 64 somewhere in my parents’ basement in the faint hope that one day I may just need to resurrect that paper that I wrote with WordStar back in 1985. You might argue that this has always been the plan for other types of collections, like the Library of Congress phonographs, or any of the thousands of nationwide collections containing Kodak slides. Preserving record players and slide projectors, however, is on a completely different scale than digital technology: The hardware and software for digital media changes so rapidly that it would be impossible to keep an up-to-date technology museum.

Not only is it impossible to keep all of that hardware around, but there are several nuances to technology preservation that one might not think of at first. Can any of us truly remember what our first Web page really looked like? Maybe it was designed for Lynx or Netscape 1.0. In 1995, I created Web pages with low bandwidth as a given; Webmasters broke up pages into sometimes ridiculously small pieces; images were kept at a bare minimum. In the (relative) high-speed 21st century, that context is completely lost. It’s hard to imagine someone saying, “Let me try this on a 386 at 2400bps using Netscape 2.0, so that I can get a feel for how this Web page was meant to work,” but it would prove equally difficult to ever re-create that experience. The Web page, as an archival artifact, ceases to exist almost as soon as it is created.

Digital Archeology. I borrow this wonderful phrase from Oya Reiger at Cornell University (look for her chapter in Moving Theory into Practice due out this spring).1 It seems only slightly more justifiable than digital preservation. I think of digital archeology as the solution that IT gurus and purse-string holders like best; the former are banking on this solution, and the latter like that it doesn’t come out of this year’s budget. “Just get it in a digital format, and we promise that you will always have access to it.” It sounds a lot like what some IT experts were telling us when libraries began exploring digital technology. Remember the days when converting to digital collections was the answer to preservation? You might think of this option as the logical extension to refreshing, migration, and technology preservation; digital archeology is the gamble upon which all of these are based.

Emulation. Even if this option is not the best available, it seems, at present, the most intriguing. Emulation involves retaining information about how a digital collection was created and accessed so that future access can be accurately and faithfully reproduced. Jeff Rothenberg—probably the leading proponent of this option—writes, “For digital documents, retaining an original may not mean retaining the original medium ... but it should mean retaining the functionality, look, and feel of the original document.” 2 Using my earlier example, imagine a piece of middleware that might emulate Netscape 2’s rendering of HTML 1.0 over a 2400-baud modem. This option seems most suitable for documents that are “born digital,” but might still prove problematic for the representation of the original that we are digitizing. This brings me to the “forgotten” option of preservation.

Preservation Through Redundancy. This method was “introduced” in the era of digitization by Paul Conway of Yale University. Let’s not forget that the original documents that we digitize often deserve some form of preservation in addition to digitization. This is where the traditional preservation units must reach middle ground with the digital libraries concerned with access. When does a new format begin to stand for the original? This is the question that preservation microfilm experts are still debating and the one that digital librarians and digital preservationists should be asking themselves. Conway writes much more knowledgeably about the comparisons of digital preservation with traditional preservation, so I will let his work stand for mine in that regard.3 Conway and others argue that the core elements of preservation should be carried forward through the digital age. The circle continues.

Common Ground to Stand On
It’s tough to get through a conversation about digital activity in libraries these days without mentioning the 1990s buzzword “metadata.” I was going to try, but perhaps I can make up for it by digressing from subject descriptors, search engines, and Dublin Core. The common ground on which all digital preservation practices will find themselves is good administrative metadata. Think of traditional metadata as that which describes the intellectual content, the plain text version of the document. Administrative metadata is what is needed to preserve the strategies outlined above. Personally, I remain skeptical about the short-term practicality and long-term viability of intensive metadata description; I feel strongly, however, about creating the metadata content that describes what Rothenberg calls the “functionality, look, and feel” of digital materials. Administrative metadata might describe everything from the hardware and software with which the document was created to structural elements like file size, image resolution, provenance, and data quality.

You Should Know Your Vendors Like Yourself
Let’s also not forget that most of our libraries buy and lease more digital collections than they create themselves. Library vendors provide another common bond, and those who want to maintain the commercial viability of their data recognize preservation needs, too. Libraries should make sure that current and potential vendors’ strategies match their own. Will your vendors provide archived digital copies upon termination of license agreements, and at what cost? Is “continued online access” a viable substitute for locally archived material? These are difficult questions when what we really want is perpetual access to what we have paid for, in a refreshed and updated manner—forever.

Have we come full circle on digital technology and the digital materials we are trying to preserve? The circle’s beauty is that it has no beginning or end, just points along its curve, ahead and behind. I like to think of this circle as a sort of job security. Sometimes the questions ring a bell, sometimes the answers do. The circle might go on forever, but the questions and answers keep getting better.

Andrew K. Pace is assistant head, systems at North Carolina State University Libraries in Raleigh. He acts as the primary liaison between the Systems Department and the Department for Digital Library Initiatives. His e-mail address is

