“Persistence in the digital world does not happen by luck but through intentional action and explicit investment. The odds that bits will survive in a shoebox in an attic are pretty small.”
–Ann. J. Wolpert, director,
MIT Libraries, “Archives and History,” 2009
In 2016, The New York Times want ed to celebrate the 20th anniversary of the launch of its website, which debuted Jan. 22, 1996. One problem: The news organization didn’t have a page capture of the site from the day it launched or from any day afterward. Editors ended up going to the Internet Archive’s Wayback Machine (archive.org) to find the first capture of the website—a screenshot of a portion of the homepage from Nov. 12, 1996. The Times’ story is not unusual. Most of the content from the first decade of the web is gone. Losing the first examples of news in any format has a notorious track record. It has ever been thus.
In our new book Future-Proofing the News: Preserving the First Draft of History (Rowman & Littlefield, 2017), we outline the technological, economic, legal, and cultural histories of the efforts to archive news: Colonial newspapers, early news graphics and photos, newsreels, radio, television, and digital news. In every case, the news archives that do exist are mostly accidents of history. Individuals took it upon themselves to collect and store old news content in attics, barns, basements, or bunkers. Sometimes those individuals also collected the equipment needed to get ac cess to that archival material (16” lacquer disc turntables, analog audio tape players, TV film projectors, etc.). The institutional commitment and funding needed from news organizations, libraries, archives, or museums to establish and maintain news archives has waxed and waned during the past 300 years of U.S. news production.
Rust to Dust
For those archives that do exist, the fragility of wood- pulp newsprint means that newspaper archives are crum bling to dust. Nitrate film used for early photographs and newsreels spontaneously ignites or deteriorates to powder. Microfilm of newspapers and television news film on acetate decompose to lumps of goo, victims of “vinegar syndrome.” Early audio and video tapes demagnetize and become unreadable.
And, of course, most early digital news content has disappeared entirely because no one thought to capture it or had the technology to do so even if it was considered. When archives of news content have been preserved, the issue of backward-compatibility always looms. That is, even if to day’s technology can convert that outmoded medium of news delivery and make that content accessible today, there seldom is the will or wallet to diligently transfer that material to each new format that comes along every few years.
Bits in Pieces
Digital archiving now is the focus of many disciplines, institutions, and types of expertise. But the bottom line is that ones and zeroes degrade (bit rot). No form of truly permanent storage for digital data has yet been engineered.
In doing research for our book, we learned about a number of efforts to develop a digital storage solution that might address some of the challenges. Scientists have used nano structured glass to record and retrieve five-dimensional digital data on a glass disc the size of a quarter. Strands of biological material can also store ones and zeroes. Data can be encoded onto DNA and successfully retrieved, and such content stored on DNA can last for 500 years or more. Whether such content would also be fully searchable remains to be seen.
Even assuming the technological issues facing permanent digital data storage are addressed, the nature of digital news content poses other problems. For years, purveyors of news websites have prided themselves on their interactive content, rich in audio, video, datasets, and links to material far outside the news sites’ own confines. But in order to “render” an archival version of a website, it is necessary to capture both the content itself and all of the underlying computer code, links, and related peripheral material that made that website function.
Digital news applications that allow users to explore deep databases of content (Which doctors in my community have taken gifts from pharmaceutical companies? Which schools provide the best educational outcomes for the price?) are notoriously difficult to capture and reproduce. Social media sites curated by news organizations, user comments sections, and other types of digital data features have been vended out to third parties by news organizations. A newsroom’s digital assets are typically spread across multiple software and hardware platforms, many of which are outside the control of the news organization.
All archivists contend with another challenge that equally faces news archives, that of copyright. So much news content produced after 1923 is subject to convoluted and ar cane copyright restrictions, major institutions have simply stepped away from trying to make it accessible at all. Individual collectors, who have been the unsung heroes of news preservation, are typically reluctant to donate their prized collections to memory institutions because they know access will be severely restricted due to copyright concerns.
A case in point: Most of the content in the Library of Congress National Recording Registry (classic radio clips) is inaccessible digitally. Why? According to the explanation on the LC website, “Due to copyright concerns, the Library of Congress is unable to post even sample audio of most National Recording Registry selections.” Action to revise Section 108 of the Copyright Act of 1976, which deals with the rules regarding memory institutions’ reproduction and distribution rights for copyrighted material, has regularly been stymied even though everyone recognizes that the rules are woefully out of date for the digital age.