THE SYSTEMS LIBRARIAN
Ensuring Our Digital Future
by Marshall Breeding
Director for Innovative Technologies and Research
Vanderbilt University Libraries
I’m thrilled to see digitization efforts underway all around the world. Almost all major libraries these days have some sort of effort underway to digitize photographs, manuscripts, or other unique content.
|While the current state of the art in digital preservation falls short of an ideal system that guarantees permanent survival, much has been done to address the vulnerabilities inherent in digital content.
Many of the libraries that I’ve visited in my travels have shared with me some of the interesting content that they have digitized and placed into their digital collection. I certainly see a pattern where libraries and archives in all parts of the world are building increasing capacity to digitize important materials of scholarly or cultural interest to both provide wider access to them and to help preserve them. National libraries naturally tend to have the largest and most sophisticated digitizing operations and the best technical infrastructure to support these efforts. More academic, research, and public libraries have likewise developed top-notch digitizing programs.
Overall, I observe that the unique and special collections in the custody of libraries receive excellent attention and will last long into the future through both physical conservation and ever-improving digital preservation processes and infrastructure. Libraries increasingly have access to trusted digital repositories that implement the best practices available to ensure that digital materials will survive into the distant future, migrating digital content forward through continuous cycles of technology. While the current state of the art in digital preservation falls short of an ideal system that guarantees permanent survival, much has been done to address the vulnerabilities inherent in digital content.
The Open Archival Information Systems (OAIS) provides a reference framework for the best practices for a long-term digital preservation archive (www.iso.org/iso/catalogue_detail.htm?csnumber=24683).
Vulnerabilities of Digital Content
Physical materials present their own challenges, of course. Much has been lost due to fires, floods, natural disasters, and general neglect. While ancient books, manuscripts, and other physical artifacts present many challenges for long-term preservation, they have nonetheless been proven to endure for very long periods of time. Documents have survived for centuries. Images from the earliest days of photography have been preserved.
I worry that some underestimate the problems involved in preserving digital information. It’s an enormous challenge. You can place a book on the shelf and, given reasonable environmental conditions, it will last intact for hundreds of years. If you place a file on any current storage media, its content may be corrupted or become inaccessible in mere decades.
It’s obvious but worth stating that all digital files should be backed up in a way that they can be restored should anything happen with the original. Hardware failures, software problems, malware attacks, and especially human error can annihilate files at any unpredictable moment. Good backup procedures will ensure the ability to recover from any computer mishap so that current processes that rely on these files can continue unimpeded. But disaster recovery helps only in the current time and near-term future. Dealing with long-term survival of digital information is a much more complex issue.
We have to realize the impermanence of all storage media used to store digital files. Unfortunately, none of them will hold content forever. One problem involves obsolescence. While file formats such as TIFF and JPEG will likely remain viable for the next decade or so, the various devices and media that store digital files have short life spans. It would be difficult, for example, to find computers today capable of reading the 5.5" floppy diskettes widely used 20 years ago. In more recent history, we’ve seen DVD drives overtake CDs, which in turn may be eclipsed by Blue-ray Discs. I suspect that it will be just as challenging to find devices capable of reading DVDs 20 years from now as it is now to read those 5.5" floppies.
The other problem involves “bit rot”: Unattended digital content will eventually deteriorate. And with digital files, it only takes a small amount of corruption to make an entire file unusable. It’s unfortunate but true that data on all forms of storage media deteriorate over long periods of time. Both magnetic and optical devices are subject to losing bits of information over time. The use of higher-quality media and storage in a well-controlled environment will extend the life, but not indefinitely.
Digital content will survive from one generation to the next only if it is periodically migrated forward onto new media. Files should be tested for integrity as frequently as possible, copied onto new media every 3–5 years, and migrated into new formats as necessary.
Having many different copies in separate locations also helps. Any sort of physical or digital disaster in one location should not impact replicates housed in a distant area. The basis of an important preservation project for electronic content launched at Stanford University is Lots of Copies Keeps Stuff Safe (see http://lockss.stanford.edu).
I hope that I’ve established the point that digital content is fragile and requires positive action to ensure long-term survival. The digital realm also affords some important advantages regarding long-term survivability. The ability to easily produce multiple copies can be enormously helpful. With ever-decreasing storage and communications costs, it’s not prohibitively expensive to create distributed repositories designed to preserve digital content for posterity.
In the absence of major digital mishaps, future archivists will face the challenges of identifying the valuable or interesting images out of an enormous digital ocean of digital information. Unfortunately, a more likely scenario would involve huge gaps in cultivating collections of cultural or scholarly interest due to losses associated with the vulnerabilities of digital media. The failures inherent with digital technology can as easily involve the loss of entire collections as it can a few individual items.
Today’s Personal Collections Form Future Cultural Heritage
In addition to collections professionally managed in libraries and archives, personal digital collections abound. Today, personal correspondence and the fruits of the labor of scholars and writers exist mostly in digital formats, including word processing documents, email, webpages, databases, and spreadsheets. Almost all photographs taken in recent years originate from digital cameras.
In previous times, a library’s new acquisitions of the papers of scholars or public figures came in the form of boxes of physical materials. More often today, such acquisitions include vast amounts of electronic content that usually proves much more difficult to organize, describe, and preserve than physical materials.
One of my concerns about the current environment is that digital content prevailing involves ensuring that everyone’s content survives for the benefit of future generations. I view personal collections of photographs and manuscripts as potential cultural heritage for future times. As one generation passes to the next, with any luck, the most interesting and important documents and photographs pass from personal collections to libraries, archives, or other cultural institutions. I worry that the cultural heritage of the future might be defined as much by those who followed better backup practices as by those with the most interesting and important content.
Today, I’m mostly thinking about images. My own collection totals more than 26,000 digital photographs—some of which I scanned from older photos and those taken with digital cameras. I’ve invested much time into building this collection, so I take extraordinary measures to be sure that no matter what goes wrong, I’ll still have at least one full copy of the collection. In addition to the active copy on my main home computer, I have a backup copy on an external USB-attached drive that’s refreshed every night, I have copies of the photos on DVD discs, and I have a full copy on another USB drive that I keep away from home that’s refreshed every time I add a new batch of photos.
I worry that the average household has lots of digital photos and other electronic materials with few if any of these kinds of safeguards in place. How many individuals have only a single copy of their digital assets that could be lost in a single incident? It takes a lot of time, attention, and discipline to keep multiple layers of backup.
Many users rely on services such as Flickr and social networks such as Facebook to manage and store their photos. I see these services as another layer of protection but not one to depend on entirely. When considering how to make use of such cloud-based services, the key issue is whether it’s practical, or even possible, to store one’s full collection and especially whether the photos can be retrieved reasonably conveniently and at the full resolution of the original photographs. Some services automatically downsize the resolution of photos as they are uploaded, which is fine for display on the web but isn’t good news for getting back the original resolution and quality if needed. Will these sites and the companies behind them be around forever? Probably not, and I would not want the survival of a generation’s digital heritage to hinge on their business success.
Public Digital Repository
I think that the scenarios I’ve mentioned around personal digital collections may provide an opportunity for libraries. Sure, libraries have plenty to do already, but it’s important to at least think about new areas of activity that can increase engagement between libraries and their customers and to strengthen the relevance of libraries with the communities they serve.
I see the need for social and technical infrastructure to facilitate the survival of the content of ordinary citizens. Such infrastructure might include a strong educational component to heighten awareness of the vulnerability of some of the commonly used media for storing files and advice on the steps that individuals can take to ensure that their digital assets can be passed on to the next generation.
A key role of the library might involve helping its clientele with developing good backup practices and encouraging them to think about strategies for the long-term preservation of their digital property. Many libraries already offer classes and workshops on many aspects of technology, often including digital photography. Helping patrons organize and preserve their photo collections leverages the core expertise of our profession.
I think that such efforts would go only so far and might not make a great impact on the transfer of digital content from our generation to the next. It requires a more proactive and broad-based approach. I envision that libraries might find some way to provide some kind of preservation service for library users, either on a local, regional, or national basis.
In my ideal world, I would like to see some form of a public digital repository to increase the probabilities that digital content created today will be available for generations to come. Such a repository would allow individuals to submit their collections of photographs for permanent archiving.
A Public Digital Repository would function as a dark archive. Groups of photographs could be deposited and withdrawn as needed, but it would not necessarily provide the capability for the public to search or view individual images. Imagine more of a safety deposit box than a library-run version of Flickr.
A public digital repository would benefit patrons in that it would be a free or low-cost service that they could trust to hold their material for posterity. Should their own copies become lost or damaged, they could be recovered through this service. I do not see this service as supplanting the need for reliable backup procedures or displacing the practice of sharing photos on social networking sites. It’s just an added layer of preservation—one designed to benefit both the individual and the community.
Such a repository would benefit libraries in their role to preserve cultural heritage and to assemble collections of interest. I would imagine that a deposit agreement might give the library some kind of nonexclusive license to select interesting images to be added to its own digital collections. I imagine the submission of batches of photos as involving an option to submit some metadata about each image, though requiring detailed descriptions might discourage submissions. I also would hope that such a service could be operated so that individuals could submit their photo collections at little or no cost—a high cost would be a large barrier to the repository’s goal to attract significant portions of privately held images.
This concept of a public digital repository may have many flaws and would need a great deal of refinement to turn into a practical reality. I’m attracted to the idea of libraries playing a role in safeguarding the digital content of the individuals in their communities. As the technology costs decrease and storage capacities increase, the ability to provide the infrastructure to store the millions of images or other digital objects involved in such an endeavor may not be entirely prohibitive. Given the fragility of digital content, I think that future generations of librarians and archivists might be pleased to be able to provide a conduit for digital items that might otherwise have been lost.