Developments in Search, Digital
Archives, and More
By Paula Hane
Maybe it's the cooler weather, the fourth quarter rush to impress investors,
or the busy fall conference season (October saw WebSearch University, Internet
Librarian International, and KMWorld; Internet Librarian is just around the
corner, followed by Online Information in London), but, whatever the reason,
things have sure been lively in the information industry. Looking over the
last few weeks, we've seen news of interesting company alliances, search engine
developments, ongoing digital preservation and access initiatives, and continuing
discussions of open access issues. And, while digesting all this news, many
of us have been glued to coverage of the presidential debates and political
campaigns. Whew! I'll be ready for a winter vacation.
New Dance Partners
The big news in mid-October was the change in Factiva's dance partner for
the legal market. The company announced that it has signed an exclusive agreement
with competitor LexisNexis. Under the terms of the 5-year deal, beginning March
1, 2005, LexisNexis will provide Factiva content on an exclusive basis to legal
market customersbringing the full text of The Wall Street Journal and
other unique Factiva content (estimated to be about 3,400 sources) to LexisNexis
customers for the first time.
Factiva has had a 10-year partnership with Thomson West, allowing Westlaw
customers to access Factiva content. That contract is ending Feb. 28, 2005.
Westlaw customers will either have to do without the Factiva content or use
LexisNexis to get it. This is likely to cause considerable customer discontent.
Thomson West has said it will be adding content to Westlaw from The New
York Times, Thomson Financial News (which includes an exclusive partnership
with MarketWatch, Inc.), and Dialog NewsRoom. But, while these are excellent
additions, the unique Factiva content really can't be replaced. This is just
another example of units within The Thomson Corp. building on logical synergies
and leveraging content and technologies from within the Thomson family.
Digital Preservation and Access
We've had encouraging news about several major initiatives in digital preservation
and access in recent times. Washington state recently unveiled the beta of
its new Digital Archive system, which is designed to stem the loss of key government
electronic records. The Delete key is a villain when it comes to the preservation
of the electronic daily record of governments, and programs like the one in
Washington are vital for the long-term survival of these historical records.
Other states are working on guidelines and policies, and are creating educational
tools to help preserve documents. Some state archivists have even taken custody
of electronic records within their state and are actively working to preserve
The U.S. National Archives and Records Administration (NARA) launched its
Electronic Records Archive (ERA) project back in 1998. It spent more than 5
years researching the problems and possibilities surrounding the issue of electronic
record preservation. In August 2004, after a rigorous competitive process,
NARA awarded two contracts for the design of ERA. At the end of a 1-year competition,
NARA will select one of these two contractors (Lockheed Martin or Harris Corp.)
to actually build ERA. Its goal is to have a functional subset of the system
operational in 2007, with full operation by 2011.
The Library of Congress (LOC) recently awarded eight institutions and their
partners more than $14.9 million to "identify, collect, and preserve digital
materials within a nationwide digital preservation infrastructure." The institutions
will share responsibilities for preserving "at-risk digital materials of significant
cultural and historical value to the nation." The broad-based partnerships
include universities, supercomputing centers, private corporations, foundations,
and state libraries. The eight preservation projects range widely in subject,
from geospatial data resources to opinion polls and voting records, and public
The LOC program is officially named the National Digital Information Infrastructure
and Preservation Program (NDIIPP). This initiative is being carried out through
a national network of partners that are committed to digital preservation.
In 2000, the U.S. Congress asked the LOC to lead this effort.
At presstime, the Government Printing Office was about to give librarians
a first look at the concept for the "next generation information life cycle
management system for official government information" during the Fall Depository
Library Council Meeting. We'll look into this for a future issue.
Search Engine News
Hardly a day goes by without news of some new search engine development,
not only from the big guys like Yahoo! and Google, but also from the growing
number of companies purporting to offer better search functionality. Some observers
have speculated that one of the newcomers challenging the established search
engines might just be the next big success story. Which one could be the next
Google? Clustering, personalization, local search, desktop search, and reaching
out for new content are all hot areas of development. It certainly makes for
interesting times, and all the competitive activity continuously forces the
feature/function bar higher.
Vivísimo, a company I've covered for several years that already offers
a search service for corporate customers, has launched Clusty.com, a free consumer
metasearch service. Clusty uses Vivísimo's clustering technology to
group results into categories, making them easier to sort through. Clusty,
which is still in beta, offers customizable search tabs for Web search, news,
images, shopping, gossip, blogs, and an encyclopedia (Wikipedia). Clusty queries
results from LookSmart, Lycos, MSN, Open Directory, Yahoo!'s Overture, Gigablast,
and Wisenut. While I think the choice of name is unfortunate (too close to "clutsy"),
for some kinds of searches, Clusty offers clear advantages.
Clustering search results provides benefits such as faster navigation, topical
focusing, and idea and relationship discovery. Users don't have to wade through
pages of results, and having results organized in folders allows hierarchical
drill-down capabilities. Its benefits haven't escaped the notice of other search
engine companies. A recent article in eWeek reported that, during a
panel discussion at the Web 2.0 conference, one of Google's top researchers "previewed
the search company's work in clustering both entities and words as a way to
better glean users' intentions and distill information on the Web."
Northern Light (known for years for its search folders that cluster results)
announced that the new version of its business search engine is available to
individual users (for $50/month) in addition to enterprises. The new version,
called the Northern Light Business Research Engine, is available at NLresearch.com.
Bill Gross, the Idealab founder and man behind Overture Services, recently
launched the beta of his new Web search venture, Snap. The new search site
uses "search-as-fast-as-you-type" technology, licensed from X1.com, an Idealab
sister company that offers enterprise desktop search. My first impression of
Snap is that only die-hard search gurus will bother to decipher the busy-looking
presentation of various rankings for search results, or understand the sorting
and filtering optionsbut this is, admittedly, a very preliminary and
In my view, we will continue to see new ventures like these emerge, and the
best of the innovations in search technology will likely be imitated or assimilated.
An article in Pandia Post reported that Norwegian company Stochasto is getting
ready to launch its natural language search engine, Answer Engine, in English
in 2005. It is already available in Russian and has won the best search engine
award at a technology exhibition in Moscow. It's possible the company will
choose to focus on the enterprise search market.
Tweaking and Enhancing
The Ask Jeeves search site (Ask.com) has been upgraded to be more personal
and more relevant. The underlying Teoma search engine has been upgraded to
3.0, local search options have been expanded, and a new MyJeeves service has
been introduced. MyJeeves lets users save search results; organize items; add,
print, and share notes via e-mail; and also search within the saved documentscreating,
in effect, a "personal Web." Ask Jeeves also said it plans to introduce a desktop
search product to the market during Q4 2004, based on technology assets it
acquired from Tukaroo, Inc. in June.
Yahoo! has also enhanced its My Yahoo! with personalization features, including
saving pages to a personal Web. The new beta version of My Yahoo! Search is
currently available to registered Yahoo! users via Yahoo! Next. Chris Sherman,
writing in SearchDay, said: "[T]he new My Yahoo! Search is well implemented
and easy to use, but doesn't offer compelling reasons to use it unless you're
looking for what amounts to an enhanced bookmark utility that's tied to Yahoo!
In addition, Yahoo! Local is now out of its 2-month beta phase. Yahoo! is
promoting it from its home page and is including Local options in the query
box. Yahoo!, Inc. also just reported that its Q3 profits have more than tripled,
though more than half of this was due to the sale of some of its stake in Google.
Over at the Googleplex
Google outdid itself in the last few weeks. Following its successful IPO,
the company announced a major expansion of its Google Print program. It had
been beta testing a limited program of search access to book excerpts from
a few publishers. Now, the company is offering to digitize, for free, book
texts from any publisher that chooses to join. Whenever a book has content
that matches a user's search terms, Google will display a special box with
links to book results. Users can browse a few pages (but cannot copy or print
them) and then can click to buy the book from Amazon.com or several others.
Google's links to books at local libraries should also be increasing, though
at this time it's still hard to find the book listings (which are often deeply
buried within search results). OCLC has expanded its Open WorldCat project
and will now permit its database of 53.3 million items connected to 928.6 million
library holdings to be indexed by both Google and Yahoo! Search. The company
may expand to allow other search engines as well. But, until Google excavates
its library results, I find the best way to find OCLC holdings is to use the
Google advanced search page and specify the domain or site as "worldcatlibraries.org."
At presstime, Google had just announced its beta entry into the desktop search
arenaan area of hot development that I've covered in every recent month's
column. Google Desktop Search will search through a PC's hard drive (the C:
drive only and not over networks), including Outlook e-mail, documents, PowerPoint
and Excel files, and even your Web page history in Internet Explorer and instant
message chats in AOL Instant Messenger. The most important feature of Google
Desktop Search is that it lets users search the Web and their own content at
the same time. This is big news, and early press coverage has been very positive
(although a few writers have raised privacy and security issues about the free
Larry Page, Google's co-founder and president of products, said: "It's free,
installs quickly, and keeps completely up-to-date. Google Desktop Search represents
a quantum leap in access to your own information."
Google beat Microsoft to the punch on this one. Earlier this year, Microsoft
acquired Lookout Software, makers of a personal Microsoft Outlook 2003 search
tool. Microsoft, AOL, and a number of other companies are all said to be working
on desktop search tools. One way or another, it's going to get easier for us
to find our digital stuff.
By the way, Google also announced Google SMS, a new test service that allows
people to use mobile phones or hand-held devices to tap Google's Web search
via text messages or short message service. Google SMS provides business and
residential listings, product prices, and dictionary definitions.
For the latest industry news, check http://www.infotoday.com every Monday
morning. An easier option is to sign up for our free weekly e-mail newsletter,
NewsLink, which provides abstracts and links to the stories we post.
Paula J. Hane is Information Today, Inc.'s news bureau chief
and editor of NewsBreaks. Her e-mail address is firstname.lastname@example.org.