Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

For commercial reprints or PDFs contact David Panara (
Periodicals > Link-Up Digital
Back Forward

Preserving Our Digital History

Bookmark and Share
Link-Up Digital

As an Internet user, you know it would be difficult to overestimate the impact of information technology on our lives. In his book InfoCulture, museum curator Steven Lubar says our information machines, “and the social structures that they are part of, have come to define our culture, at least as much as ethnicity, race, or geography. How we feel about the world around us, about one another, even about ourselves has been changed by these machines and the way we’ve chosen to use them.”

Of course, two of the primary technologies that have had a profound impact on how we experience the world are radio and television, yet no definitive archives of those media exist. Yes, there are small collections of programs and a few archives of historical footage, but much of the early days of those media has been lost forever.

The Internet Archive is an organization and a Web site working to ensure the same thing does not happen to our digital media.

The Wayback Machine
Located in the Presidio of San Francisco, the Internet Archive was founded as a nonprofit organization in 1996. Its mission is “to build an ‘Internet library,’ with the purpose of offering permanent access for researchers, historians, and scholars to historical collections that exist in digital format.” 

“Libraries exist to preserve society’s cultural artifacts and to provide access to them,” says a note on the archive’s site []. “If libraries are to continue to foster education and scholarship in this era of digital technology, it’s essential for them to extend those functions into the digital world.”

The note adds that “without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures.” Collaborating with such institutions as the Library of Congress and the Smithsonian, the archive is striving to preserve those artifacts for future generations.

According to Brewster Kahle, director of the archive, the average life span of a Web page is about 100 days. But the archive’s Web site includes a Wayback Machine that lets you view pages dating from 1996. Do you want to know what the front page of Yahoo! looked like on February 9, 1997? Just enter in the Wayback Machine’s search box on the site’s home page. Want to see from December 12, 1998? Just enter its URL. An advanced search interface is available, too, but the site doesn’t offer an indexed text search of the documents in the collection. The editors are working on it, however, and full-text searching may be available soon.

The archive hasn't recorded every page from the past. Some sites weren’t included because the archive’s automated crawlers weren’t aware of them. Some weren’t included because the sites were password-protected or otherwise inaccessible. Some were removed because their Web site administrators asked them to be taken out. You also should note that the Wayback does not add pages less than 6 months after they are collected, and in some cases, updates can take 12 months.

Sill, the Wayback Machine already has archived more than 10 billion Web pages, which makes it one of the world’s largest publicly accessible databases. It contains 100 terabytes of data, and it’s growing at a monthly rate of about 12 terabytes. To put that figure in perspective, consider that the Library of Congress contains about 20 terabytes of data. The Internet Archive’s FAQ points out, “If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii.”

Special collections
Besides the Wayback Machine, the archive offers several special collections. For example, the Moving Image Collections includes the Prelinger Archives, which contain more than 900 digitized industrial, educational, and government films dating from 1903. You can find, for instance, amateur films of construction of the Golden Gate Bridge and of the New York World's Fair of 1939.

The Moving Image Collections also include dozens of archived episodes of “The Computer Chronicles” and “Net Café.” There’s a sampling of "Orphan films" from the Orphan Film symposium at the University of South Carolina, and you can access the World at War Collection, which was created through an Internet Archive contest that challenged people to create short films demonstrating why access to history matters.
An audio archive,, is a network of mailing lists and FTP servers that provide access to high-quality digital recordings of live music performances. All the concerts available through the servers are performances by musicians and bands that allow noncommercial recording and distribution of their live concerts.

The archive’s Text Collections page provides access to such electronic text projects as The International Children’s Digital Library, Project Gutenberg, Arpanet, and the Million Book Project.

The Internet Archive also is collaborating with Macromedia to make thousands of software titles available for remote execution.
A September 11 archive [] includes thousands of Web pages from news organizations, government and military agencies, and charitable organizations. 

An Election 2000 collection [] contains 800 gigabytes of relevant data gathered from August 1, 2000, to January 21, 2001. You can see, for example, how looked on Election Day, Tuesday, November 7, 2000.

Practical purposes
Now you might be thinking, “Well, I’m glad somebody is archiving the Internet, and it’s nice that I could see what Yahoo! looked like in February of 1997, but why would I want to?”

There are a number of practical purposes for the archived pages. In an article in ONLINE magazine (“The Wayback Machine: The Web's Archive,” March/April 2002), Greg R. Notess, reference librarian at Montana State University, points out several possible uses: “Patent searchers can verify prior art. Business experts can look up failed companies' business plans. Employers can investigate job applicants' student Web pages. Sources lost because of complex URL shifting can be found by their old URL on the Wayback Machine.”

Notess also points out that “the ability to view a range of versions of a particular page, and to browse the archived site itself, offers a range of uses. A new Web designer can look at previous incarnations of a site, even if the organization itself never archived the various versions. A new business can look at their competitors' early designs and avoid the same mistakes. And the researcher who is trying to track down the online resources from the bibliography of a 4-year-old paper can find them in the archive, even if they have otherwise vanished from the current Web.”
(Notess’s article also contains excellent information on how to search the archive. You can read the article on the Web at

The Internet Archive also could be used to explore the role information technology is playing in our lives. Bernardo Huberman of the Xerox Palo Alto Research Center has pointed out that “researchers could use the Archive’s Web snapshots in combination with usage statistics to compare how people in different countries use the Web over long periods of time.... Political scientists and sociologists could use the data to study how public opinion gets formed. For example, suppose a device for increasing privacy became available: Would it change usage patterns?"

Answering such questions will be increasingly important as the digital information revolution continues. New technologies will arrive, and each will have the potential to enhance or diminish society. We continually will need to assess the impact they have had on us and on our view of the world. Online libraries such as the Internet Archive will be able to help us with the task for generations to come. As the site says, “Internet libraries can change the content of the Internet from ephemera to enduring artifacts of our political and cultural lives.”

Thomas Pack is a freelance writer who lives near Louisville, Kentucky.

       Back to top