Google Book Search Has Far to Go
By Mick O'Leary
Over the past 2 years, trade journals, magazines, and newspapers have been publishing articles about Google Book Search (http://books.google.com). But even if you had read every one of them, you still wouldn’t know much about the project itself, because most of the discussion has focused on the copyright controversy with little about the database and how it works. So here are the details.
Book Search is difficult to research because the Google site has little documentation about the project: There’s no list of participating publishers, no guidelines for the book selection process, no status reports on the library scanning program, etc. This is not only annoying, it’s hypocritical for an organization with a mission “to organize the world’s information and make it universally accessible and useful.”
Book Search has three book search services: 1) a library union catalog search of WorldCat and others, 2) books scanned from library collections, and 3) in-stock books provided by publishers. It’s ironic that the first and most innovative of these is overlooked, while the second and most rudimentary and problem-ridden gets all of the attention.
Google’s Best Work
The Library Search is a union catalog that simultaneously searches WorldCat and 15 other union catalogs representing 30 countries (Google doesn’t display an actual list). Results from each catalog are displayed separately, i.e. the results from all of the separate union catalogs are not merged. Within WorldCat, a list of the holding libraries is arranged by distance from your location. When you click on each library’s link, it opens the book record in its catalog. For the other union catalogs, records are displayed in the default order of the catalog, usually alphabetically. These records also usually open to the individual library’s catalog record.
This enormous worldwide finding tool (a wonderful achievement) is an important platform for WorldCat’s end-user interface. WorldCat is online elsewhere (http://www.worldcat.org), but its presence on the most popular search engine gives it much greater public visibility. WorldCat took a big step by making the database accessible to the general public recently; this was complemented by Google making it possible to search all of these other union catalogs at the same time as WorldCat. Including other union catalogs is a marvelously executed technical task and a major step toward accomplishing Google’s mission of organizing the world’s information.
The Library Search Debacle
The Google Book Search Library Partners program is an ambitious plan to scan books from seven major libraries: Harvard University, The New York Public Library, Oxford University, Stanford University, the University of California, the Universidad Complutense de Madrid, and the University of Michigan (Google does name these). Since the program was announced in October 2004, it has been enmeshed in the copyright controversy, which we don’t need to repeat here. With the case too close to call, the decision won’t be made anytime soon. Two things have already happened: The entire project has been delayed, and Google’s unilateral approach has created great enmity in publishing and author communities.
As announced, the project was designed to scan the libraries’ holdings, including post-1922 books that are not in the public domain. However, exactly what books were to be scanned has been misleading from the start. Several of the libraries agreed to scan only their public domain holdings, but this important fact is difficult to find. And, while Google has extolled its speedy scanning technology, the copyright problem has slowed scanning of nonpublic domain titles. As a result, Book Search has an enormous vacuum in the middle; a substantial amount of pre-1923 content and many thousands of in-print titles exist, but there’s very little in between. In other words, most of the 20th century’s publishing isn’t in Book Search. And it may not be, unless Google wins the copyright suits or (if it loses) it reaches an accommodation with the publishers to provide their copyrighted books.
Book Search for In-Print Books
Book Search’s search service for in-print books also underperforms. First of all, it’s not innovative; it’s basically the same (in content and searching) as Amazon’s Search Inside! feature (Information Today, February 2004). Second: The original is better.
Google’s Partners Program includes hundreds of thousands of books (this is an estimate; Google doesn’t provide exact figures) from hundreds of publishers. The publisher partners range from the largest trade and technical publishers to numerous university, specialized, and small presses. It’s impossible to know just how much of a publisher’s line is provided to Book Search because—you guessed it—Google doesn’t say. Nevertheless, Book Search has a large and very important share of the world’s current publishing, including some non-English language books.
Book Search is very similar to Amazon’s Search Inside! feature, but Amazon’s feature has several critical advantages over Book Search. The most important is that Amazon has the latest books; Book Search does not. Perhaps because of differing licenses with the publishers, Book Search is often several years behind; Amazon has the latest releases and also lists forthcoming titles. For example, Amazon’s feature has the latest books by Pat Buchanan, James Lee Burke, Ann Coulter, Jeffrey Deaver, Tom Friedman, and Robert B. Parker; Book Search does not (and is usually two or three books behind with these popular authors). This seriously devalues Book Search as a tool for finding, buying, or researching books.
Book Search’s search features include Boolean operators, phrases, author, title, publisher, date, and ISBN. The full text of all books is searched. Each record displays publication data and the number of times the search query appears in the text. Search results are displayed in relevance order, up to about 300 items (another advantage of Amazon’s search option is that it doesn’t automatically cut off the number of hits).
Book display in Book Search has a confusing mix of formats depending on copyright status and publisher license. There are four options: 1) Full View—a full-text display for public domain titles and those for which the publisher has granted access, 2) Snippet View—a small keyword in context (KWIC) display for books for which Book Search does not have copyright permission, 3) No Preview—for a small number of items showing only a citation, and 4) Limited Preview—which is authorized by the publisher and displays a three-page window containing the search query. Display window size is yet another major advantage of Amazon’s search feature, which displays a five-page window. Page displays cannot be printed or copied and pasted, but they can be captured by the Print Screen command on your keyboard.
Limited Preview is the most common—and most important—display option, because it provides significant access to the full text of in-print books. It is both an effective retailing technique (both Google’s and Amazon’s search features report that full-text viewable books sell more than others) and a powerful search tool. Each record also has links to several online booksellers.
Why the Fuss?
So far, Book Search deserves neither its own self-promotion nor the adulation that many commentators have bestowed upon it. Its collection of full-display public domain books is of interest only to scholars and other specialized researchers. Project Gutenberg and other public and proprietary databases have more important collections of full-text legacy e-books.
The library scanning project is hindered by the lawsuits. If Book Search is allowed to complete the project, it will be useful primarily as a library finding tool. Because of the rapid pace at which knowledge advances, these books have already lost much of their value for research. And since they are no longer in print, their only commercial value is in the used book trade. If Google loses the lawsuits, it will be at the mercy of the publishers, who are not favorably disposed toward it. It’s also not likely that the publishers will be motivated to permit Book Search to incorporate books that have no commercial value to them. If the publishers win, there probably will be some agreement with Google to continue scanning, but it may not be cheap.
Instead of rehashing the copyright issue, librarians should be carefully studying Book Search and Search Inside!, which promise to affect the future of library book collections profoundly. Each of these vast collections of academic and technical books is immensely valuable for any kind of research, and each is much larger than any individual library e-book collection. The three- and five-page windows are unsuitable for reading the entire text, but they are often quite sufficient for just a few research pages. If you want to continue with successive pages, search a distinctive word in the last page to open the next window. Saving pages with Print Screen may not be pretty, but it works. Both Google’s and Amazon’s search features are powerful and compelling alternatives to both the print and e-book collections of every library.
Librarians should be exploring partnerships with Google and Amazon that will open their content more efficiently to library constituencies. The proprietary e-book databases now available to libraries are limited in content and very costly. These two have created much larger research-oriented e-book collections. What is needed next are creative, mutually beneficial partnerships. If Google won’t do it, then maybe Amazon will.