As an Internet user, you know it would be difficult to overestimate the impact of information technology on our lives. In his book InfoCulture, museum curator Steven Lubar says our information machines, “and the social structures that they are part of, have come to define our culture, at least as much as ethnicity, race, or geography. How we feel about the world around us, about one another, even about ourselves has been changed by these machines and the way we’ve chosen to use them.”
Of course, two of the primary technologies that have had a profound impact on how we experience the world are radio and television, yet no definitive archives of those media exist. Yes, there are small collections of programs and a few archives of historical footage, but much of the early days of those media has been lost forever.
The Internet Archive is an organization and a Web site working to ensure the same thing does not happen to our digital media.
The Wayback Machine
“Libraries exist to preserve society’s cultural artifacts and to provide access to them,” says a note on the archive’s site [www.archive.org]. “If libraries are to continue to foster education and scholarship in this era of digital technology, it’s essential for them to extend those functions into the digital world.”
The note adds that “without cultural artifacts, civilization has no memory and no mechanism to learn from its successes and failures.” Collaborating with such institutions as the Library of Congress and the Smithsonian, the archive is striving to preserve those artifacts for future generations.
According to Brewster Kahle, director of the archive, the average life span of a Web page is about 100 days. But the archive’s Web site includes a Wayback Machine that lets you view pages dating from 1996. Do you want to know what the front page of Yahoo! looked like on February 9, 1997? Just enter www.yahoo.com in the Wayback Machine’s search box on the site’s home page. Want to see Amazon.com from December 12, 1998? Just enter its URL. An advanced search interface is available, too, but the site doesn’t offer an indexed text search of the documents in the collection. The editors are working on it, however, and full-text searching may be available soon.
The archive hasn't recorded every page from the past. Some sites weren’t included because the archive’s automated crawlers weren’t aware of them. Some weren’t included because the sites were password-protected or otherwise inaccessible. Some were removed because their Web site administrators asked them to be taken out. You also should note that the Wayback does not add pages less than 6 months after they are collected, and in some cases, updates can take 12 months.
Sill, the Wayback Machine already has archived more than 10 billion Web pages, which makes it one of the world’s largest publicly accessible databases. It contains 100 terabytes of data, and it’s growing at a monthly rate of about 12 terabytes. To put that figure in perspective, consider that the Library of Congress contains about 20 terabytes of data. The Internet Archive’s FAQ points out, “If you tried to place the entire contents of the archive onto floppy disks (we don't recommend this!) and laid them end to end, it would stretch from New York, past Los Angeles, and halfway to Hawaii.”
The Moving Image Collections also include dozens of archived episodes of “The Computer Chronicles” and “Net Café.” There’s a sampling of "Orphan films" from the Orphan Film symposium at the University of South Carolina, and you can access the World at War Collection, which was created through an Internet Archive contest that challenged people to create short films demonstrating why access to history matters.
The archive’s Text Collections page provides access to such electronic text projects as The International Children’s Digital Library, Project Gutenberg, Arpanet, and the Million Book Project.
The Internet Archive also is collaborating with Macromedia to make thousands of software titles available for remote execution.
An Election 2000 collection [http://web.archive.org/collections/e2k.html] contains 800 gigabytes of relevant data gathered from August 1, 2000, to January 21, 2001. You can see, for example, how www.georgebush.com looked on Election Day, Tuesday, November 7, 2000.
There are a number of practical purposes for the archived pages. In an article in ONLINE magazine (“The Wayback Machine: The Web's Archive,” March/April 2002), Greg R. Notess, reference librarian at Montana State University, points out several possible uses: “Patent searchers can verify prior art. Business experts can look up failed companies' business plans. Employers can investigate job applicants' student Web pages. Sources lost because of complex URL shifting can be found by their old URL on the Wayback Machine.”
Notess also points out that “the ability to view a range of versions of a particular page, and to browse the archived site itself, offers a range of uses. A new Web designer can look at previous incarnations of a site, even if the organization itself never archived the various versions. A new business can look at their competitors' early designs and avoid the same mistakes. And the researcher who is trying to track down the online resources from the bibliography of a 4-year-old paper can find them in the archive, even if they have otherwise vanished from the current Web.”
The Internet Archive also could be used to explore the role information technology is playing in our lives. Bernardo Huberman of the Xerox Palo Alto Research Center has pointed out that “researchers could use the Archive’s Web snapshots in combination with usage statistics to compare how people in different countries use the Web over long periods of time.... Political scientists and sociologists could use the data to study how public opinion gets formed. For example, suppose a device for increasing privacy became available: Would it change usage patterns?"
Answering such questions will be increasingly important as the digital information revolution continues. New technologies will arrive, and each will have the potential to enhance or diminish society. We continually will need to assess the impact they have had on us and on our view of the world. Online libraries such as the Internet Archive will be able to help us with the task for generations to come. As the site says, “Internet libraries can change the content of the Internet from ephemera to enduring artifacts of our political and cultural lives.”
Thomas Pack is a freelance writer who lives near Louisville, Kentucky.