Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > November 2003
Back Index Forward

Information Today
Vol. 20 No. 11 — December 2003
The Latest on Enterprise Search Products, E-Books, and More
by Paula J. Hane

As we head into the busy late-fall season of conferences, end-of-year company reporting, holidays, and school activities, the number of press releases hitting my desk seems to have accelerated as well. Many of us in the industry will be meeting and greeting at venues like Internet Librarian 2003 in Monterey, Calif., and Online Information 2003 in London. Users will be looking over the latest products and services. "Will they buy?" is the question. Information providers will be checking out competitors' offerings as well as potential partners for technology and marketing alliances. There should be a steady supply of news to report, as vendors continue to announce new products and services through the end of the year. Here's a wrap-up of some of the news from the past month.

Gale Branches Out

Another traditional information provider has linked up with one of the big Web search companies. (See the September 2003 NewsLink Monthly Spotlight at Gale, a subsidiary of Thomson Corp., has arranged to link from most of its InfoTrac products to Google's image collection. Gale chose not to license content from expensive commercial image providers but instead will implement Google Image Search. This partnership will allow Gale users to search Google's more than 425 million images for no extra charge. (For more information, see the NewsBreak at

Branching out into another information format, Gale announced in July its forthcoming e-book program that would offer a collection of reference titles with an easy-to-use database interface. The Gale Virtual Reference Library, which was set to launch at press time, allows libraries to select from an initial collection of 85 reference sources—encyclopedias, almanacs, and series—to create a customized, integrated online information service with unlimited usage and 24/7 remote access.

The Virtual Reference Library offers flexible options that let libraries choose to buy one e-book or multiple e-books and search across a single e-book or entire e-book collection. The content is provided in HTML format, so no special software reader is required. An Adobe PDF option is available to view the actual page layout. In addition, the company says it will be releasing several hundred more e-book titles. Also, in 2004, it will publish directory sources with a special interface, allowing for more granular access to fielded data.

An E-Book Revolution?

The Open eBook Forum (OeBF), the electronic publishing industry's trade and standards organization, recently claimed that e-books have "quietly become a major force in the worlds of media and technology." The group reported that in the first half of 2003, e-book sales revenues were up by 30 percent and unit sales were up by 40 percent over the same period in 2002. This compares to an annual growth rate of just about 5 percent in traditional print publishing. E-book sales are expected to top $10 million in 2003.

OeBF revealed statistics on the current state of e-books and provided an industry analysis in its first quarterly "eBook and eDocument Publishing and Retail Statistics" report. The quantitative assessment was compiled from data submitted by 34 publishers and retailers. Some of the information can be found at, but the full findings were only made available to participating companies and OeBF members.

"Those of us in the industry have been seeing real signs of growth from every direction," said OeBF executive director Nick Bogaty. "Libraries are a huge growth category as they look to revitalize themselves in the age of Google, school systems are finding that today's kids like to read when the media is digital, and consumers are snatching up better devices and more titles as fast as they can."

I don't think the consumer market is really ready for mass-market e-book sales, which is why stopped selling them in September. I continue to believe that certain types of content are much better suited to the e-book format than others—for example, rapidly changing information, user guides, and reference materials. Thus, I think that Gale will find considerable success with its Virtual Reference Library, as did Knovel—which provides access to key STM reference materials—and several other information publishers.

Mark Gross, president of Data Conversion Laboratory (DCL), said, "An electronic book revolution has happened, only it came in by stealth and was not reported by the mainstream media." In a special DCLnews report (, he explained, "The reason the media hasn't caught on to this story is that digital books are not for everyone, and the digital books people are using every day, such as technical manuals and online reference titles, aren't perceived as e-books."

In the report, Gross provided the following list of "Five Prime Indicators for an eBook." They have been summarized with the permission of Gross and DCL:

1. Readers need access to a very large amount of data but are only interested in looking at a little bit of it (e.g., reference books, technical
documentation, and legal libraries).

2. Data that change rapidly, such as technical information and manuals

3. Rare books and manuscripts that are too fragile to touch

4. Materials that have low publication and distribution volumes

5. Self-published books

For additional commentary and perspective on e-books, see Mick O'Leary's column in the September/October 2003 issue of ONLINE (p. 59). He stresses the importance of institutional sales rather than sales to individuals and subscription rather than transactional pricing.

Linking Update

In last month's column, I reported on a number of important linking-initiative announcements, noting that this is a hot area of news. Recently, NFAIS, the association for "organizations that aggregate, organize, and facilitate access to information," released a document titled "NFAIS Guiding Principles: Reference Linking" (available at The group is encouraging all those involved in any aspect of information creation or distribution to provide for a reference-linking capability in their products and services.

With so many linking arrangements already in place, I wondered about the need for an official statement at this point. Linda Beebe, chair of the NFAIS linking committee and senior director of the American Psychological Association's PsycINFO, agreed that indeed, "linking is alive and well, but there's still a lot of work to be done." She noted that the primary publishers, particularly those in the STM field, have led the way in reference-linking initiatives. The NFAIS committee—and the entire NFAIS board—felt it was important to make a collective statement that would encourage other publishing disciplines and the secondary publishers—whose products are usually delivered on third-party platforms—to work on collaborative linking.

"The organization strongly believes that industrywide collaboration in support of reference linking is essential to managing the flow of scholarly communication," said NFAIS president Marjorie Hlava. "Reference linking provides a seamless navigation between bibliographic and full-text databases, speeding the research process and ultimately accelerating discovery across all scholarly disciplines as well as in business."

This is a worthy cause indeed. Let's hope to see the widespread adoption of these principles.

Enterprise Search Is Hot

I've noticed a recent buzz of activity from companies announcing new and improved enterprise search products. Most of the newer products do much more than just provide keyword searching. The clear trend is to integrate entity extraction, linguistic technologies, taxonomies, and classification with search technology to offer users better search results with less work.

I recently reported on the launch of Endeca's new ProFind 4.0. (See the October 2003 NewsLink Monthly Spotlight at This enterprise search solution uses the Endeca Search and Guided Navigation engine, which combines full-text searching with navigation capabilities. The company says ProFind is different from other search engines because of its ability to discover relevant relationships in data and find accurate and precise results with unprecedented speed.

Endeca ProFind can handle all types of content (both structured and unstructured) within an enterprise, including databases, documents, or e-mail. Business partners like ClearForest provide rules-based native, entity, and concept extraction from the content. ProFind can be integrated with existing taxonomies.

Copernic, a company known for Copernic Agent, its consumer metasearch product, officially launched Copernic Enterprise Search. (See the NewsBreak at The company chose not to target the high-end enterprise search market of Fortune 500 firms that currently is dominated by companies like Verity and FAST. It instead offers a product that's specifically designed to meet the needs of the small-to-medium-sized enterprise (SME) and the departments of larger enterprises.

Copernic Enterprise Search uses advanced linguistic and statistical technologies to identify the key concepts and sentences of indexed documents. It also does automatic indexing of new and updated documents in real time. In addition to handling internal information in many formats, the software can index external Web pages and supports indexing of XML feeds.

Northern Light is on a comeback path since its repurchase from the "Divine demise." (See the NewsBreak at Known since its original founding in 1996 for its taxonomy and classification that use patented clustering technology as well as for its results folders, Northern Light employs search, classification, and content integration technology and services to offer user-friendly search solutions for corporate clients. Although it hasn't made an official announcement, the company, led again by CEO David Seuss, has released its Northern Light Enterprise Search Engine, an offering that it says delivers performance, relevance, and "unparalleled scalability."

The Northern Light Enterprise Search Engine for Solaris operating systems uses the technology that powered the Northern Light Web search engine. It can search up to 25 million documents on a single software installation on a single server. The price is certainly right. A license for a 150,000-document database is only $2,500 per year, including support and updates. The company also offers a free 30-day evaluation copy to install and try. Watch for additional news about Northern Light coming soon.

Other vendors operating in this space are Autonomy, Convera, Inxight, Stratify, and Verity. Some of the companies that provide additional pieces of technology and partner with search vendors include iPhrase, Antarctica, Intelliseek, and ClearForest.

Information Discovery

At press time, ClearForest Corp. announced the availability of ClearForest 5.0. The company offers products that read vast amounts of structured and unstructured text; extract relevant information that's specific to users' requirements; and provide visual, interactive, and textual executive summaries. The new 5.0 release adds relationship-analysis tools, four new industry-specific solution modules, and enhanced database scalability. ClearForest now has the ability to tag and analyze Arabic and Hebrew in addition to Western European languages.

ClearForest is not a search engine, but it can work with them. Its ClearTags platform produces standard tagged XML that can be searched with other software, such as Endeca. Barak Pridor, ClearForest's CEO, calls it a business intelligence solution that provides for the discovery of facts, patterns, trends, and relationships, which would otherwise be hidden within an organization's unstructured data. He said: "If you know what you're looking for, use a search tool. If you don't know what you're looking for, use a discovery tool."

By the way, ClearForest uses some nifty visualization technology to clearly represent the revealed relationships. While Information Today has covered some developments in visualization technologies and products over the last few years, the trend toward incorporating visual representations seems to be finally making inroads into solid applications like this one.

Antarctica Systems, Inc., a company built on the principle that most people respond better to visual presentations, recently announced version 4.0 of its Visual Net (VN) software. VN provides a map interface to information of all kinds. The company redesigned the entire user interface, upgraded the underlying technology, and built in additional interactivity. The changes position VN to handle the data complexities of large enterprises, a market it's now heavily targeting. Antarctica also partners with business-software vendors and search engines. (See the NewsBreak at

Finally, IBM launched its long-awaited WebFountain, a Web-scale text-mining and discovery platform that extracts trends, patterns, and relationships from massive amounts of unstructured and semistructured text. With more than 1 petabyte (1,024 terabytes) of content already in storage, it's well on its way to mining the entire Web.

The WebFountain platform will be used to develop new products and services in partnership with other companies. It offers some truly impressive components: a supercomputer-based infrastructure; multi-terabyte data stores; and text analytics that include natural language processing, statistics, probabilities, machine learning, pattern recognition, and artificial intelligence. The possibilities for this information-discovery platform could be tremendous.

Factiva announced a partnership with WebFountain to develop a service called Reputation Manager, which is scheduled for release in the second quarter of 2004. Reputation Manager will combine more than 2 years of Factiva content with WebFountain's Web data to let executives discover what the world is saying about a company and its products. When combining premium content with Web content, it's not hard to envision any number of potentially useful business information tools. (See the NewsBreak at

Web Search

With companies like IBM crossing over into the Web, it's logical to wonder what search engine companies like Google and Yahoo! will do next. Who knows what they might be working on or testing already?

Meanwhile, what seems to be of most interest are shopping and opportunities to sell. Yahoo! recently rolled out Yahoo! Product Search, an e-commerce search engine that will power the redesigned Yahoo! Shopping ( The site offers features such as side-by-side product comparisons, detailed buyer's guides, a tax and shipping calculator tool, consumer product and merchant ratings, product reviews, etc. I have to admit, the advanced search features are pretty effective. Even Chris Sherman of SearchDay said the Pin Point product-recommendation tool is "seriously cool."

Several media outlets report that has formed, a new, independent business unit that's charged with building a shopping search tool for internal use and for other companies. And Google has been beta-testing its Froogle product-comparison search since December 2002.

Also new at Google Labs, the company's technology showcase, is Search by Location, a feature that lets users focus a search on a specific U.S. location and then provides a map from MapQuest with the results marked. Overture is reportedly also testing a local search capability. In a recent SearchDay article, Danny Sullivan claimed that in preliminary tests, local searching was generally still a disappointing experience.

In other news, Amazon and Microsoft announced that Amazon will provide Microsoft Office 2003 users with seamless access to from within Microsoft productivity applications via the Research Task Pane. Users will be able to access information and make purchases without launching a browser or leaving their document, e-mail message, or presentation. (I told you selling was big.) Previously announced partners for a spot in the Research Task Pane include Factiva, Gale, Alacritude, LexisNexis, Ovid, and Elsevier.

Just a cautionary note to illustrate the danger of putting too many eggs in a single basket: LookSmart, a paid search provider, lost its contract with Microsoft, which had supplied more than half (!) of its revenues. LookSmart shares have plummeted.

The Wave of the Future

The eBusiness Research Center (eBRC) at Penn State's Smeal College of Business Administration has launched SmealSearch (, a new niche search engine that targets business research documents on the Internet. SmealSearch finds and catalogs academic articles, working papers, white papers, consulting reports, magazine articles, published statistics, and business facts by crawling the Web sites of universities, commercial organizations, research institutes, and government departments.

SmealSearch is built on NEC Research Institute's CiteSeer, the largest search engine for scientific literature. SmealSearch is the second search engine launch by eBRC in the past year. It follows the late-2002 rollout of eBizSearch, a resource that helps researchers access relevant and current information in e-business, e-commerce, and other closely related topics.

"General-purpose search engines can only carry researchers so far," said Lee Giles, associate director of research at eBRC and creator of the technology on which SmealSearch is based. "In the future, we predict the evolution of increasing numbers of powerful niche search engines that address specific needs of specific audiences."

I couldn't agree more.

For the latest industry news, check every Monday morning. An easier option is to sign up for our free weekly e-mail newsletter, NewsLink, which provides abstracts and links to the stories we post.

Paula J. Hane is Information Today, Inc.'s news bureau chief and editor of NewsBreaks. Her e-mail address is
       Back to top