Pundits have described the Internet as the greatest boon to literacy since Johannes Gutenberg invented the printing press in the 15th century. Despite the Internet’s multimedia versatility, communication over the Internet remains largely words typed on a keyboard and read on a screen.
Some have even predicted that the Internet will do away with conventional printing just as paper replaced papyrus, clay, and lambskin. This may eventually happen, but the paper industry shows no signs of going away any time soon, and the paperless office remains a futuristic fantasy.
Still, the Internet has complemented many traditional print media. Books, however, have been a technological laggard. Lately, significant inroads have been made in the areas of printing books, buying books, reading books, and perhaps most interestingly doing research with books.
Major players are involved, including Google, Microsoft, and Yahoo!, as well as some of the top libraries in the world. Google, the Internet’s most popular search engine, is getting the most attention and creating the most controversy.
Google Book Search (http://books.google.com), formerly Google Print, lets you search for free through books just like Google lets you search through the Web, with Google earning profits through advertising. In cooperation with university and public libraries as well as book publishers, Google is digitizing both out-of-copyright books and more recent books still subject to copyright protection.
On balance, giving people quick access to book knowledge is a good thing. The ultimate goal is the same as envisioned by the builders of the great Library at Alexandria (completed by the Macedonian rulers of Egypt around 300 B.C.): to archive the world’s knowledge in printed form.
Google has been as aggressive as these ancient archivists, employing thousands of workers around the world to scan books to create its own universal library. It has also been aggressive in how it interprets the fair use aspect of the copyright law, including books in its repository unless notified by the copyright holder not to. Both moves have led to the controversy.
The Authors Guild and the Association of American Publishers separately sued Google for copyright infringement, contending that Google Book Search will hurt authors.
But you can see only a very limited amount of any book still in copyright. Google contends that the current book component of its service is more a book marketing program rather than an online library. Depending on the permissions given by the copyright holder, a viewer is typically able to view either snippets of text or a small number of pages surrounding the search term. Google also gives copyright holders the option of removing a book from Google Book Search.
The way Google scans books has also been criticized. Google won’t disclose its techniques, but reports indicate that it uses at least in part a robotic technique without a human being checking the results, which causes some pages to be unreadable, some to be scanned more than once, some to be in the wrong place, and some to be cut off.
Much of this scanning takes place abroad. It’s significantly less expensive to scan a book in China than Des Moines, Iowa. But this leads to the descriptive data associated with any book, including its title, author, date of publication, and category, to be wrong more often than it should be, making the archive less useful.
Google Book Search has been operational since late 2004, though Google still indicates it’s in the beta, or testing, stage.
Google isn’t the only guy in town trying to create a universal library. Microsoft is engaged in a similar effort associated with its Live Search service (www.live.com) called Live Search Books.
The Open Content Alliance (www.opencontentalliance.org), which is affiliated with Yahoo!, is also undertaking a similar effort, one that, like Microsoft’s, duplicates what Google is doing. But while Google and Microsoft make the content of their digitized books available only through their respective services, the Open Content Alliance makes its content available through any search engine, including Yahoo!.
Given the inherent quality control problems, this redundancy isn’t necessarily a bad thing.
Some of the criticism of Google Book Search stems from the fact that most scanned books are in English. The French, as expected, have led the way here. But the National Library of France (Bibliothèque nationale de France) is engaged in its own book scanning project called Gallica (http://gallica.bnf.fr).
Until a way is devised to compensate authors, don’t expect the entire contents of current books to be freely available through any service. That, in fact, is the true Holy Grail of online research.