by Barbara Quint
Editor, Searcher Magazine
As information professionals, we have two primary — not to say, primal — goals the pursuit of which qualifies us as a profession. One is access, guaranteeing that our clients, which some might define as encompassing the human race (a definition Second Lifers might consider unduly limiting), attain all the truth they can use. But an even more basic goal is the guaranteeing that all information in existence stays in existence, the command to archive.
Over the millennia, an array of vendors have taken on much of that archiving responsibility. From traditional publishers to database aggregators to the mighty Google, centralized suppliers of content have gathered and winnowed the wheat from the chaff — through peer review in the case of scholarship, through editorial blue pencils in the case of general press. Librarians would collect the publications and store them for the future. In the case of generally nonsalable content, such as rare documents and arcana, librarians might provide special archiving services, including digitization.
Ah, yes! Digitization. And there’s the rub. As digital content has superseded print, the economy of information has shifted. End users dominate the market for data everywhere. What institution could be more end-user-oriented than Google? And yet, what content could be more arcane than some of the ancient tomes entombed in Google Book Search? Publishers now supply digital copies of their content via licenses to libraries and their institutional patrons. So far, periodicals have been the focus of digital delivery modes, but ebooks are clearly on their way.
As for the mission to archive, in some sense, it has never been better. Publishers that might have had little or no interest in their own past publications before are now digitizing up a storm. Recently, HighWire Press announced a partnership with the Royal Society that would cover the Society’s eight publications, one of which would extend all the way back to 1665. Most quarrels librarians have with publisher-based archives have involved access issues, such as prohibitive prices and restrictive usage terms. And then, of course, there’s Google, the Giant Killer, and all that mass of Google Book Search and Google Scholar content.
However, something is slipping and sliding away: redundancy. In the past, publishers relied on a business environment where you made once and sold many. Libraries supplied the archive for past publications and, by the very
fact of multiple sales to multiple institutions, formed a defense against erosion or destruction of content. But no more. Today libraries license access to centralized collections maintained and held fast by publishers or their minions. There are no multiple copies in multiple places to guarantee an eternal archive. At present, Google Book Search is probably the best defense against the loss of a “last known copy.” However, not even Google Book Search can defend against the loss of proprietary, copyrighted information in licensed collections of periodicals.
“So what?” you may ask. Publishers will not toss away what makes them money. Everyone has learned the lesson of Turner Classic Movies — that selling off overhead-chomping movie libraries to some sucker with cash proved agonizingly shortsighted when cable television and DVD technologies opened up vast new markets. Content is King! And the King has a long tail!
Maybe, but maybe not. The open access movement is already challenging the relevance and durability of long-established scholarly publishers. Web 2.0 phenomena — from blogs to social networks — have exploded the use of the web as a merger of publishing and communication. But where will that content be lodged in 2108? 2058? Or even 2018? Publishers do not even keep up with archiving their own content now — or at least what users may perceive as their content.
Go to any website for a newspaper today and you will see wire service feeds, staff blogs, connections to outside blogs, reader contributions, etc. Go back next week, or even tomorrow, and — poof! — all gone. Legal rule this. Contract rule that. But most of all, the lack of a sense of responsibility for archiving anything not produced at and for the newspaper. Of course, this is nothing new. Whole sections of newspapers, e.g. classified ads, have never formed part of any archive. Syndicated content never went into the basic newspaper database. After the Tasini decision, any article not written by staff was excised. But even the Tasini decision did not challenge the right of microform copies or complete digital collections, purchased by many libraries until the digital collections became web-based and centralized.
What’s different now is that the habits of under-archiving have extended to publishers that should know better, like scholarly publishers. Already, it seems that most of these publishers have an unwritten but firm policy to remove any article from their collection that proves resoundingly false. Hoax articles have been snatched out of the digital collections of journals as top-drawer as Science and Nature. I can see why one would not want to continue to perpetrate a hoax, but, on the other hand, that hoax is part of the scientific record. But such occurrences are rare. What happens every day is the addition of links to outside articles, supplementary digital material, and — now — more Web 2.0 interactive content flowing into scholarly publishers’ websites.
But where’s the permanent record? Who guarantees that reality remains documented? Can librarians include a requirement for comprehensive archiving in their license agreements? When it comes to open web free content, should librarians start to do their own archiving — with or without benefit of copyright clergy? Should we start
an underground ILL network — “You snatch journal X’s extra stuff. I’ll snatch media site Y’s content. Tell Librarian Z that I’ll swap my Y for their F and G.”?
Redundancy, fellow conspirators, redundancy. If not us, who?