December 11, 2006 — A little over a year ago, Microsoft announced that it was joining the Open Content Alliance (OCA; http://www.infotoday.com/newsbreaks/nb051031-2.shtml) to create a database of full-text books. Planning to launch a beta version of its books database sometime in 2006, Microsoft has now met its goal with the launch of Live Search Books (beta) at http://books.live.com. All of the books available on Live Search Books are out-of-copyright titles, thus avoiding the copyright controversy at Google Books. The initial load at Live features titles from several libraries’ collections, including the University of California, the University of Toronto, and The British Library. Microsoft also announced the addition of The New York Public Library and the American Museum of Veterinary Medicine as future contributors.
Live Books has a substantial collection at its launch. The works from the three contributing library partners are not tagged with information about the originating library, but just browsing through the first few pages (or the last) often turns up a stamp from libraries such as the University of California, Trinity College Library, or the University of Toronto. Cliff Guren, Microsoft’s director of Publisher Evangelism, notes that books from Cornell University are being scanned and should be added soon.
Moving beyond the Google Books copyright rubric of considering only titles from 1922 and earlier to be copyright-clear, Live Books has a number of post-1922 books available in full text. These include U.S. government publications that are copyrighted but freely reproducible. Other works identified as free of copyright restrictions are also included.
This makes for some unique books in the Live collection. For example, a 1968 work, Timber Management in the Pacific Northwest Region, 1927–1965, is available even though it is post-1922. It comes from a University of California collection of oral history transcripts. Many others are included as well.
But what about the OCA, to which Microsoft belongs? The intent of the OCA book digitization initiative was to pool resources to provide full-text books that would be available from multiple outlets. Live Books is the first major source of OCA books, but the Internet Archive has many of these within its Text Archive at http://www.archive.org/details/texts.
A Microsoft spokesperson reports that “nearly 100 percent of the books in the current release are also available through the OCA; going forward the out-of-copyright books we scan will be available to any academic institution providing they agree not to make the files available to other commercial Internet search services.” What this will mean for Yahoo!, which is also part of the OCA, remains to be seen. The Internet Archive Text Archive does include Timber Management in the Pacific Northwest Region, 1927–1965 along with far more detailed metadata about the copyright status, the originating library, and even the scanner operator. Other Live Books, such as Ulrich Zwingli, the Patriotic Reformer, are not available at the Internet Archive.
The Search Interface
According to Guren, Microsoft’s long-term vision is to integrate book records into its core search. In other words, some regular Web searches at Live will eventually include records for books. These may be separated out at the top of the search results like other “instant answers” and they may be included in the regular results list. However, at this point, none of the book records are showing up in the Web search results.
So, how do you find the book search page? There is no link available to Live Books from the main Live.com home page. Run a search, and then click on the “More” tab (or “scopes” as Microsoft labels them) to see a variety of beta databases including Books. After choosing Books or running a search directly from http://books.live.com, Books will continue to appear as one of the “scope” choices. Microsoft calls this separate database a “vertical,” and Guren notes that even when books are included in Web search, “I don’t think the vertical will necessarily go away.”
Searching is across the full textual content of the books. The resulting list of books with cover images is displayed within a continually scrolling page, like the Live image search. This means that only three or four records may be visible on a standard resolution monitor. Guren mentioned that the white space on the right will eventually be used for additional information, such as ads or links to publishers. Neither appears in the beta version, nor do any links to library holdings or Open WorldCat.
In this beta version, an advanced search page is not available—just the single search box above the scopes. Nor does Live Books yet offer any advanced commands for searching authors, dates, publishers, or contributing library. The one advanced prefix that does work comes from the Web search side. Limit to title words by using the intitle: prefix.
The Book Reader
Selecting one of the book search results will open the book within the Live book reader. The left page of the window gives very basic information about the book, such as title, author, publisher, and date (but not publisher location). The scanned pages are shown in the right pane.
The reader is very much a beta product. Compared to either the Google Book reader or Amazon’s Online Reader (both of which have had recent updates), the Live book reader has limited features. Click arrows to move forward or backward a page or jump to the front or back of the book. Otherwise, the only other navigation is by searching inside the book. The reader doesn’t give links to a table of contents or an index. On the plus side, most of the records do have a “Download the entire book” link to a full PDF of the whole book—a feature Google did not add until later in its beta.
In talking about the rollout of the book search, Guren stated that they are “very mindful of the need to continually track how people are responding to ensure that we do the best job we can.” So expect to see changes to the reader and to the overall user interface as Live Search Books continues to mature.
As search expert Gary Price likes to point out: “[M]any other digitization programs are out there” (http://www.resourceshelf.com/2006/12/06/microsoft-book-search-goes-live-online). Google Books has been around longer but still has unresolved legal issues with its scanning of copyrighted books. Amazon’s Search Inside the Book program provides full-text searching of some currently published books (with publishers’ permissions), but the full text of the book is only available by purchasing the book.
Microsoft has plans to include copyrighted works, if a publisher grants permission. Its Publisher Program site at http://publisher.live.com goes into more depth about the upcoming addition of copyrighted works with limited user access. Once that happens, Live will resemble the scope of Google Books more closely.
The launch of Live Search Books adds more competition to the book search space and adds sources that may not be available from anywhere else. Expect the competition to foster continued improvement to the collection, searching, and reader from Microsoft. The initial beta shows promise for both Live Search Books and for what other OCA partners may offer.