On The Net
Searching Books Between the Covers
By Greg R. Notess | Reference Librarian, Montana State University
To find information inside a printed book, people traditionally rely on an
index or, for a few works, a concordance. With the advent of e-books, however,
people could search the entire text, assuming they bought the e-book. Although
a growing number of copyright-free books are now on the Web, those still under
copyright remained unsearchableuntil now.
First, in October 2003, Amazon introduced Search Within the Book with extracts
from some books and the full text from others. Then, Google started its Google
Print program with book extracts. Next, in April 2004, Amazon launched A9.com,
which combined Search Within the Book with Web searching.
More recently, Google revamped its book portion of Google Print by expanding
from extracts to full text from some publishers. In addition, book results,
news, and product results now appear at the top rather than as part of the
regular listings. Google also announced plans to digitize books at the libraries
of Stanford, the University of Michigan, Harvard, the New York Public Library,
and Oxford. These digitized versions will also be available via Google Print.
In all these endeavors, copyright issues assure that most users cannot view
the entire text of any works still under copyright. Amazon and Google limit
the number of pages you can view even if the entire text can be searched. While
the number of books available from these programs is in the hundreds of thousands
now, rather than the millions promised by Google when it finishes its library
project, it still begs the question of when and how we should use these resources.
FULL-TEXT BOOK SEARCH OPTIONS
While numerous commercial databases offer full-text searching of various
e-book collections, this column focuses on the new, free search choices offered
by search engines. Amazon’s Search Inside the Book is available at the
main U.S. Amazon.com site but not yet at its U.K., Canada, or other international
sites. Amazon initially gave Search Inside the Book matches for any query,
but now some matches may only be seen after choosing the “Click here
to see additional results” message. In addition, the record for each
book that is searchable has a “search inside” icon with an arrow
above the book jacket image. To see the page image from the book, Amazon requires
a user name and password.
A9.com, the Amazon-owned search engine, has a variety of personalization
features and a collection of databases. The Books database is, of course, the
Amazon book catalog including the Search Inside the Book titles. Again, to
see the page image that is located at Amazon, you need an Amazon user name
and password. The page images are still located on the Amazon servers.
Google Print is a combination of initiatives. It includes book extracts from
publishers, full textbooks from publishers, and items from its library initiative.
The difficulty now is that there is no direct access to the database. The Google
Print site [http: //print.google.com] gives only a brief overview of the service
with no search box. Originally, the results were included in regular Google
search results, but that changed late last year. Now the Google Print matches
show up at the top, above the regular results. They have a books icon and a
header of “Book Results for . . .” followed by up to three matching
hits. Unfortunately, no option is available to get to more than the first three
listed results. It is important to note that the full text of items within
Google Print are not all directly searchable from Google. First, find the Google
Print record and then use the internal search for just that title to get a
Google Print results only show up for certain searches. Like Amazon, it only
works at the main U.S. Google site and not yet at the various international
versions. In the past, you could add site:print.google.com to a search term
to limit results to the Google Print records. As of winter 2005, that technique
only retrieved the small, dated collection of full-text magazine articles from
Reed Business Information. Currently, the most effective way to get Google
Print results to display is to preface search terms with book on or book about
or just book. Note that book must be the first word in the query. The search book
mark works while mark book does not. Occasionally, for very well-know
titles, just the book title is needed with no prefix.
WHEN TO SEARCH BETWEEN THE COVERS
Several colleagues tell me they use neither of these tools. Other than initial
experimentation, I rarely used them myself for the first few months. More recently,
I have been trying to remember that the databases are there and find uses for
them. My first success story was a question about a quotation source. Numerous
Web sites included the quotation and the author’s name, but none gave
even a partial citation to the specific work, much less the page numbers. After
failing to find the answer via Web search engines and standard quotation sources,
I tried the search at Amazon and struck gold. The phrase was found in the Search
Inside the Book and let me view the exact page. I was able to provide the user
with a full citation and page number without even leaving the reference desk.
It was especially helpful since the book was not in the library’s collection.
Beyond quotation searching, book searches can be used to verify citations,
especially for chapter titles, and to look at the actual copyright page of
a book. Other applications include checking for plagiarism, hunting for intellectual
property violations, and tracking mentions of trademarks and business names
in both fiction and nonfiction books. For distance reference service, it lets
both user and librarian look at the same page of a book while discussing it
over the phone. For the reader who only remembers a character’s name
but not the title or author, the book databases offer a new source in which
to dig. Other uses likely abound, but we need to start considering what possibilities
these databases offer, especially as they grow.
In my, so far, limited experience searching free book content on the Web,
two distinct approaches emerge. First, try a phrase search for an extract from
a book. Typically, a four- or five-word phrase can narrow results to just a
few hits. Remember that these records no longer appear in regular Google results,
so preface the search with book to find Google Print results.
The second approach is to search for a book title as a phrase. Note that
at Amazon (and thus A9’s books as well), phrase searching is not exact.
Words within the phrase are stemmeda search finds both singular and plural
forms. In addition, stop words within a phrase are ignored.
Often, the two strategies can be combined. Use a title search to see if the
book is in one of the databases. Then, use the in-text phrase search to find
the appropriate passage in the book. I have used this search to find the title
of a work and then searched that title as a phrase to check its availability
at Amazon or Google. Frequently, I search both databases and the open Web.
One of the many problems in the current state of free book content searching
is the wide disparity of access. It seems that most publishers that wish to
provide free content online work with Amazon and Google. However, many only
work with one or the other, while some publishers provide more free online
content on their own sites than at either Amazon or Google.
Take the example of the ever-popular Dietary Reference Intakes: Applications
in Dietary Planning published by the National Academies Press. Searching a
phrase from page 20 of the book “nutrient intakes feeds” gets
no hits at Google. Amazon and A9 find one hit, but it is for a different book.
Using the second strategy and searching the title itself as a phrase finds
the book at A9 and Amazon, but there is no “Search Inside This Book” option.
Google Print appears to pull up the work with the search book “Dietary
Reference Intakes: Applications in Dietary Planning”. It lists the result
as Dietary Reference Intakes, but it is actually a separate work in that series
(subtitled Guiding Principles for Nutrition Labeling and Fortification, although
Google does not give this bibliographic information except in the enlarged
cover image). But the search is not yet over.
On the library database side, WorldCat lists three entries for this title,
each with an “Internet Resource” tag and a corresponding URL. One
is for a copy in netLibrary; another is from Ebrary; and the third, connected
with the print record, is for a table of contents available at the Library
of Congress. None of these sources find the free full text that is available
directly from the National Academies Press Web site.
In this case, just searching the title as a phrase at Yahoo!, Google, Teoma,
or MSN will bring up the free full text [www.nap.edu/books/0309088534/html] with each page available as a GIF or PDF.
Another example: A search for true power of grouping, which is from Managing
and Using MySQL, second edition, strikes out at Google, A9, Amazon, Yahoo!,
and MSN. Surprisingly, Ask Jeeves finds a hit for this phrase at a site that
requires a user name and password, which in turn implies that the book is available
from a commercial source as an e-book.
Knowing that far too many copyrighted books have been posted somewhere online,
I searched for another phrase from earlier in the book. This time, a search
engine found a PDF version of the entire book at an academic site in China
(which also included dozens of other books).
Both Amazon and Google Print include Managing and Using MySQL, but Amazon
includes only excerpts from the book, while Google only finds it if searched
by title (preceded by book). In other words, a regular Google search for a
phrase from a book may not find a Google Print record. Therefore, search by
the title as well. Then try it again since Google Print results do not always
display. On one search, I got no “Book results for . . .” Yet,
just clicking the Search button one more time displayed some book results.
The Amazon and Google databases may change dramatically in the next few months.
Both are basically still experimental programs, and the companies need to carefully
balance access with copyright limitations. The size and scope of the databases
depend on which publishers grant permission. With the library agreements, Google’s
database should be huge, but there are few library titles included at this
point. They predict a 6-year timeline to finish.
Amazon generally found more titles than Google for several searches I tried.
However, a significant problem with such a comparison is that the full text
of many books available from Google is not directly searchable. Searching a
phrase from a book may get no results on a Google search, but searching for
the title and then using the search box within the Google Print display did
find the page. An example is searching for the phrase “update shareholders
and customers”, which occurs on page 40 of Information Technology Security.
That search finds no results at Google; the same phrase finds the book and
extract at both A9 and Amazon. Once you know the title of the book, Google
finds it easy enough with the search book information technology security.
A search for the phrase “clung to bohemian ideals” at Google
found zero results. At A9, in the books column, the search finds Artists, Advertising,
and the Borders of Art by Michele H. Bogart, but the link only goes to
the book at Amazon and not to the specific page. Searching the phrase directly
at the Amazon book search results in the message:
Book search results: we found no results that closely match
your search for: “clung
to bohemian ideals”
Click here to see additional results that may be relevant to your search.
Only after following the link in that last line will Amazon give a Key Word
in Context (KWIC) extract and the link to the exact page with the quote.
Score? Google gets zero for not finding the book unless you already know
it is there. A9 starts better, but without the extract or a direct link to
the page, A9 still requires a user to repeat the same search at Amazon once
the book has been located. Starting the search at Amazon gives a KWIC extract
after the second click, with the full-page image only a more click away.
However, for every example listed, I also found times where each of these
behaved quite differently on another search. In one case, a regular Google
search found a Google Print source where the search term was on page 773. At
times, A9 does include a KWIC extract and a direct link to the page. Occasionally,
Amazon also directly displays Search Within the Book KWIC extracts without
the “Click here to see . . .” message.
Again, expect many changes ahead for both of these endeavors. Access and
scope may both change. Certainly, we can hope that Google will provide more
direct access to its database. In the meantime, consider what opportunities
these databases offer for the searchers’ toolbox.
Greg R. Notess [email@example.com; http://www.notess.com/]
is a reference librarian at Montana State University
and founder of SearchEngineShowdown.com.
Comments? E-mail letters to the editor to firstname.lastname@example.org.