The California Light and Sound Collection: Preserving Our Media Heritage
by Richard P. Hulser
While most of the focus on digital preservation and access has been on digitizing printed materials, there is an initiative underway in California to capture and make accessible audiovisual content in such a way that even libraries, museums, and archives with limited resources can participate.
| “We must save our audiovisual heritage before it is too late; analog recordings are threatened by fragile physical condition, format obsolescence and the lack of playback equipment.”
—Barclay Ogden, California Preservation Program
The California Light and Sound collection is the outgrowth of the California Preservation Program’s California Audiovisual Preservation Project (CAVPP). The collection already contains many locally significant oral histories and amateur films that intimately document everyday life in 20th-century America.
The ultimate goal for the project is to provide public access to media content through the Internet Archive (archive.org) for teaching, research, and study. The task is being accomplished by partnering with libraries, museums, and archives throughout the state to build the digital collection.
Selection and Digitization
CAVPP plays the lead role in helping participating partner organizations conserve and preserve their audiovisual collections according to best practices for the archiving and preservation of moving image and sound formats. It also established a low-cost and practical workflow for helping partner organizations efficiently digitize key media artifacts. CAVPP coordinates all digitization activities with the vendor doing the digitization work and helps the participating institution throughout the process.
Identifying objects for digitization—The institutional partner first assesses its audiovisual collection to determine what items to nominate for digitization. For institutions with large collections, CAVPP recommends using a tool called CALIPR (lib.berkeley.edu/preservation/calipr), designed by the California Preservation Program office expressly for institutions without experts on staff to assess the preservation needs of larger paper-based and audiovisual collections.
An institution doesn’t have to have a large number of recordings in its collection to participate. CAVPP wants to preserve locally important recordings (such as those found in the history rooms of public libraries or historical societies), so smaller organizations may preserve a handful of recordings at a time. Partners are able to nominate individual recordings or whole collections. In a round of nominations in 2014, partners were asked for up to 100 recordings. If all 100 cannot be funded in one submission round, the items can be nominated later as long as funding is available to continue the project. In my library’s most recent submission, we nominated and had accepted about 13 items (roughly a third of what we nominated in the first round).
Nominating process—After assessing the collection, recordings are nominated by creating records in CAVPP’s CONTENTdm (from OCLC) account, which is used by CAVPP to manage all nominations. The nomination process requires that potential participants provide certain metadata (such as a main or supplied title, media type, name of the holding institution, date created, a copyright statement, and a statement as to why the recording is significant to California or local history).
The copyright statement can be particularly challenging, especially if there is little or no documentation as to the origin or ownership of the item. A recording may have been donated to an institution, but if the documentation doesn’t clearly indicate a transfer of distribution rights to the institution or that the recording is in the public domain, it may not be able to be posted to the Internet Archive. In that case, it wouldn’t be eligible for digitization under the terms of the grant.
Once CAVPP has reviewed and approved nominated items, the original recordings are sent to CAVPP for processing. CAVPP adds administrative metadata to the record and sends recordings to the vendor for digitization.
Digitization process—Based on current practices in the media preservation field and with input from participating partner archives, a list of technical specs was created as the default output format for various file types. Contributing partners can also request special additional output formats at a nominal cost.
The vendor performs several steps in the digitization process, including first photographing the medium and its container, and then inspecting, prepping, and transferring the recording according to the CAVPP specs. Treatment is done to the recording only if necessary, and CAVPP always checks with a partner before proceeding with any restoration. Technical metadata about the transfer is compiled and recorded.
Outsourcing the digitization work was found to be a cost-effective approach for tackling a large amount of materials and a wide range of formats. To optimize quality control, CAVPP prefers working with labs that can handle all audiovisual formats. This not only saves shipping costs but ensures that the appropriate standards and procedures are applied to all recordings. To this end, it has mostly worked with the vendor MediaPreserve, located outside of Pittsburgh. However, it is currently trying out other vendors as well. CAVPP has also worked with in-state vendors, depending on their specialty.
Quality assurance process—Following digitization, CAVPP performs a quality assurance check on the digital files. It checks the technical specifications of each file state, both the preservation master and the compressed access version. Sound and image quality are checked at the beginning, middle, and end of a recording (approximately 10% of a 30-minute file). Metadata is verified, and the content is checked to ensure it matches the title. In some cases, reviewers at this point suggest alternative titles to more accurately reflect the content if the original title was estimated or listed as “unknown.” Once the files are checked and uploaded to archive.org, CAVPP sends an email to the partner notifying it that the files are online and ready for review.
The last stage of the process is for the partner to check the quality of the digitized recording posted on the Internet Archive. Partners are asked to check files within 30 days after the recordings are online. This helps CAVPP assess the sound and image quality of the transfers in order to report back potential issues to the digitization vendor in a timely manner.
CAVPP provides detailed information and examples to help partner institutions perform the necessary quality-check steps. This part of the process takes a bit of time for the partner and includes watching and/or listening to the entire digitized recording to ensure that what was meant to be digitized was digitized and that the result is acceptable, barring any inherent problems with the original item. A student intern or volunteer can be invaluable for a first or second pass at examining the digital content.
After files and metadata are approved by CAVPP and the partner institution, the vendor returns the original materials to the partner’s archive along with a hard disk drive containing archive versions of the digitized items. If the partner wants to keep the hard disk drive, that cost will be added to the invoice. Otherwise, the partner is expected to download the files and ship the hard drive back to the vendor.
Preservation and Access
Once digitization is complete, the focus naturally shifts to preservation and access. In addition to the web storage and public access provided at archive.org, CAVPP maintains an offline depository of master files and encourages participating libraries to do the same.
The size of digital audiovisual files and their storage costs are formidable. CAVPP estimates that preservation video masters average 102GB, and access video files average 1GB, assuming a running time of 60 minutes. CAVPP currently needs 103TB to store 1,000 moving image and sound recordings in both file states. While online storage prices are in flux as drive density continues to increase, CAVPP indicates that current prices for online storage are in the range of $1,000 per TB a year, leaving the annual cost (not just a one-time cost) to store the California Light and Sound collection masters online unsustainable. Therefore, CAVPP implemented an affordable, expandable storage solution using a hybrid model of offline storage for preservation files on LTO (Linear Tape-Open—an open format tape storage technology). Between 2010 and 2014, CAVPP has stored 3,000 files offline, representing 216TB-plus of data on LTO at a cost of $16,140 (or just $5 a recording).
In accordance with the digital archival principle of redundancy, each partner is encouraged to store at least one copy of all file states per digital object. Currently, 73% of CAVPP’s active partners store copies of their files on hard disk drives (HDDs), RAID, or their own servers.
To get an idea of storage costs for a partner, the average storage needed for 12 recordings is about 1.23TB, assuming all are moving images with maximum running time. With a 2TB hard drive (costing about $250), the storage costs for 12 recordings would be about $155 (for HDD storage media and shipping) or approximately $13 per recording.
Morals of the Story
CAVPP’s goal may have been to save and provide access to California’s significant, at risk, historical sound and moving image recordings, but in addition to accomplishing that, it has achieved more. Surveys of its institutional partners have confirmed that the CAVPP collaborative model is effective in helping them address their preservation needs and provide access to audiovisual materials even with limited or no funding. The project demonstrated how to streamline the preservation workflow by establishing standards and stimulating reviews of current standards and practices, and it has inspired institutions to address the needs of other recordings outside the project’s regional scope.
And through the broader visibility of the Internet Archive, partners are using their digital archive materials as a marketing tool to promote their collection beyond the brick-and-mortar of their institutions. The files they have produced for this effort can also serve as a proof of concept for other digitization projects and for recruitment of potential donors to support other digitization efforts. CAVPP’s initiative has demonstrated an effective and affordable way for organizations to collaborate in digitizing content that broadens an understanding of local history that would otherwise be lost.