Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > November 2004
Back Index Forward

Information Today

Vol. 21 No. 10 — November 2004

NewsBreak Update
Developments in Search, Digital Archives, and More
By Paula Hane

Maybe it's the cooler weather, the fourth quarter rush to impress investors, or the busy fall conference season (October saw WebSearch University, Internet Librarian International, and KMWorld; Internet Librarian is just around the corner, followed by Online Information in London), but, whatever the reason, things have sure been lively in the information industry. Looking over the last few weeks, we've seen news of interesting company alliances, search engine developments, ongoing digital preservation and access initiatives, and continuing discussions of open access issues. And, while digesting all this news, many of us have been glued to coverage of the presidential debates and political campaigns. Whew! I'll be ready for a winter vacation.

New Dance Partners

The big news in mid-October was the change in Factiva's dance partner for the legal market. The company announced that it has signed an exclusive agreement with competitor LexisNexis. Under the terms of the 5-year deal, beginning March 1, 2005, LexisNexis will provide Factiva content on an exclusive basis to legal market customers—bringing the full text of The Wall Street Journal and other unique Factiva content (estimated to be about 3,400 sources) to LexisNexis customers for the first time.

Factiva has had a 10-year partnership with Thomson West, allowing Westlaw customers to access Factiva content. That contract is ending Feb. 28, 2005. Westlaw customers will either have to do without the Factiva content or use LexisNexis to get it. This is likely to cause considerable customer discontent. Thomson West has said it will be adding content to Westlaw from The New York Times, Thomson Financial News (which includes an exclusive partnership with MarketWatch, Inc.), and Dialog NewsRoom. But, while these are excellent additions, the unique Factiva content really can't be replaced. This is just another example of units within The Thomson Corp. building on logical synergies and leveraging content and technologies from within the Thomson family.

Digital Preservation and Access

We've had encouraging news about several major initiatives in digital preservation and access in recent times. Washington state recently unveiled the beta of its new Digital Archive system, which is designed to stem the loss of key government electronic records. The Delete key is a villain when it comes to the preservation of the electronic daily record of governments, and programs like the one in Washington are vital for the long-term survival of these historical records. Other states are working on guidelines and policies, and are creating educational tools to help preserve documents. Some state archivists have even taken custody of electronic records within their state and are actively working to preserve them.

The U.S. National Archives and Records Administration (NARA) launched its Electronic Records Archive (ERA) project back in 1998. It spent more than 5 years researching the problems and possibilities surrounding the issue of electronic record preservation. In August 2004, after a rigorous competitive process, NARA awarded two contracts for the design of ERA. At the end of a 1-year competition, NARA will select one of these two contractors (Lockheed Martin or Harris Corp.) to actually build ERA. Its goal is to have a functional subset of the system operational in 2007, with full operation by 2011.

The Library of Congress (LOC) recently awarded eight institutions and their partners more than $14.9 million to "identify, collect, and preserve digital materials within a nationwide digital preservation infrastructure." The institutions will share responsibilities for preserving "at-risk digital materials of significant cultural and historical value to the nation." The broad-based partnerships include universities, supercomputing centers, private corporations, foundations, and state libraries. The eight preservation projects range widely in subject, from geospatial data resources to opinion polls and voting records, and public television programs.

The LOC program is officially named the National Digital Information Infrastructure and Preservation Program (NDIIPP). This initiative is being carried out through a national network of partners that are committed to digital preservation. In 2000, the U.S. Congress asked the LOC to lead this effort.

At presstime, the Government Printing Office was about to give librarians a first look at the concept for the "next generation information life cycle management system for official government information" during the Fall Depository Library Council Meeting. We'll look into this for a future issue.

Search Engine News

Hardly a day goes by without news of some new search engine development, not only from the big guys like Yahoo! and Google, but also from the growing number of companies purporting to offer better search functionality. Some observers have speculated that one of the newcomers challenging the established search engines might just be the next big success story. Which one could be the next Google? Clustering, personalization, local search, desktop search, and reaching out for new content are all hot areas of development. It certainly makes for interesting times, and all the competitive activity continuously forces the feature/function bar higher.

Vivísimo, a company I've covered for several years that already offers a search service for corporate customers, has launched, a free consumer metasearch service. Clusty uses Vivísimo's clustering technology to group results into categories, making them easier to sort through. Clusty, which is still in beta, offers customizable search tabs for Web search, news, images, shopping, gossip, blogs, and an encyclopedia (Wikipedia). Clusty queries results from LookSmart, Lycos, MSN, Open Directory, Yahoo!'s Overture, Gigablast, and Wisenut. While I think the choice of name is unfortunate (too close to "clutsy"), for some kinds of searches, Clusty offers clear advantages.

Clustering search results provides benefits such as faster navigation, topical focusing, and idea and relationship discovery. Users don't have to wade through pages of results, and having results organized in folders allows hierarchical drill-down capabilities. Its benefits haven't escaped the notice of other search engine companies. A recent article in eWeek reported that, during a panel discussion at the Web 2.0 conference, one of Google's top researchers "previewed the search company's work in clustering both entities and words as a way to better glean users' intentions and distill information on the Web."

Northern Light (known for years for its search folders that cluster results) announced that the new version of its business search engine is available to individual users (for $50/month) in addition to enterprises. The new version, called the Northern Light Business Research Engine, is available at

New Ventures

Bill Gross, the Idealab founder and man behind Overture Services, recently launched the beta of his new Web search venture, Snap. The new search site uses "search-as-fast-as-you-type" technology, licensed from, an Idealab sister company that offers enterprise desktop search. My first impression of Snap is that only die-hard search gurus will bother to decipher the busy-looking presentation of various rankings for search results, or understand the sorting and filtering options—but this is, admittedly, a very preliminary and limited assessment.

In my view, we will continue to see new ventures like these emerge, and the best of the innovations in search technology will likely be imitated or assimilated. An article in Pandia Post reported that Norwegian company Stochasto is getting ready to launch its natural language search engine, Answer Engine, in English in 2005. It is already available in Russian and has won the best search engine award at a technology exhibition in Moscow. It's possible the company will choose to focus on the enterprise search market.

Tweaking and Enhancing

The Ask Jeeves search site ( has been upgraded to be more personal and more relevant. The underlying Teoma search engine has been upgraded to 3.0, local search options have been expanded, and a new MyJeeves service has been introduced. MyJeeves lets users save search results; organize items; add, print, and share notes via e-mail; and also search within the saved documents—creating, in effect, a "personal Web." Ask Jeeves also said it plans to introduce a desktop search product to the market during Q4 2004, based on technology assets it acquired from Tukaroo, Inc. in June.

Yahoo! has also enhanced its My Yahoo! with personalization features, including saving pages to a personal Web. The new beta version of My Yahoo! Search is currently available to registered Yahoo! users via Yahoo! Next. Chris Sherman, writing in SearchDay, said: "[T]he new My Yahoo! Search is well implemented and easy to use, but doesn't offer compelling reasons to use it unless you're looking for what amounts to an enhanced bookmark utility that's tied to Yahoo! search results."

In addition, Yahoo! Local is now out of its 2-month beta phase. Yahoo! is promoting it from its home page and is including Local options in the query box. Yahoo!, Inc. also just reported that its Q3 profits have more than tripled, though more than half of this was due to the sale of some of its stake in Google.

Over at the Googleplex

Google outdid itself in the last few weeks. Following its successful IPO, the company announced a major expansion of its Google Print program. It had been beta testing a limited program of search access to book excerpts from a few publishers. Now, the company is offering to digitize, for free, book texts from any publisher that chooses to join. Whenever a book has content that matches a user's search terms, Google will display a special box with links to book results. Users can browse a few pages (but cannot copy or print them) and then can click to buy the book from or several others.

Google's links to books at local libraries should also be increasing, though at this time it's still hard to find the book listings (which are often deeply buried within search results). OCLC has expanded its Open WorldCat project and will now permit its database of 53.3 million items connected to 928.6 million library holdings to be indexed by both Google and Yahoo! Search. The company may expand to allow other search engines as well. But, until Google excavates its library results, I find the best way to find OCLC holdings is to use the Google advanced search page and specify the domain or site as ""

At presstime, Google had just announced its beta entry into the desktop search arena—an area of hot development that I've covered in every recent month's column. Google Desktop Search will search through a PC's hard drive (the C: drive only and not over networks), including Outlook e-mail, documents, PowerPoint and Excel files, and even your Web page history in Internet Explorer and instant message chats in AOL Instant Messenger. The most important feature of Google Desktop Search is that it lets users search the Web and their own content at the same time. This is big news, and early press coverage has been very positive (although a few writers have raised privacy and security issues about the free application).

Larry Page, Google's co-founder and president of products, said: "It's free, installs quickly, and keeps completely up-to-date. Google Desktop Search represents a quantum leap in access to your own information."

Google beat Microsoft to the punch on this one. Earlier this year, Microsoft acquired Lookout Software, makers of a personal Microsoft Outlook 2003 search tool. Microsoft, AOL, and a number of other companies are all said to be working on desktop search tools. One way or another, it's going to get easier for us to find our digital stuff.

By the way, Google also announced Google SMS, a new test service that allows people to use mobile phones or hand-held devices to tap Google's Web search via text messages or short message service. Google SMS provides business and residential listings, product prices, and dictionary definitions.

For the latest industry news, check every Monday morning. An easier option is to sign up for our free weekly e-mail newsletter, NewsLink, which provides abstracts and links to the stories we post.


Links (NewsBreak on Factiva/LexisNexis partnership) (NewsBreak on Washington's Digital Archive system),1759,1668357,00.asp (eWeek article on Google clustering preview) (NewsBreak on Google Print) (NewsBreak on OCLC WorldCat)

Paula J. Hane is Information Today, Inc.'s news bureau chief and editor of NewsBreaks. Her e-mail address is
       Back to top