Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > May 2004
Back Index Forward

Information Today

Vol. 21 No. 5 — May 2004

The Latest on Search, Content, and More
By Paula Hane

April and May are wonderful months. The greening, flowering, and warmer temperatures cheer us, and sending in our tax returns provides great relief. Many folks enjoy Earth Day activities on April 22. This year's National Library Week, held April 18—24, marked the fourth year of the Campaign for America's Libraries, a public education effort sponsored by ALA to promote the value of libraries and librarians in the 21st century. To help libraries showcase their many services that week, Thomson Gale offered them free access to promotional resources as well as to a number of databases.

Information Today, Inc. is sponsoring three conferences May 11—12 in New York: WebSearch University, Streaming Media East, and Enterprise Search Summit. The latter is a brand-new event targeted at those who are tasked with implementing site-search functions within their organizations. In advance of the summit, anyone interested in enterprise search is invited to test-drive some leading technology solutions by visiting a unique implementation that searches content on the ITI sites. The new Enterprise Search Center launched on April 5 and will be available throughout this year

Enterprise Search Update

Enterprise search continues to be hot. Recently, Endeca announced the availability of Endeca ProFind 4.1, the latest version of its enterprise search and navigation platform. Endeca also teamed up with Stratify, a provider of unstructured data-management software, and Taxonomy Warehouse (part of Synapse Corp.), a provider of industry-specific taxonomies, to bolster its capabilities for managing and adding structure to traditionally unstructured documents and content, such as e-mail, Word files, PowerPoint presentations, Adobe PDF files, etc.

Speaking of taxonomies, ProQuest Information and Learning announced that Convera (formerly Excalibur Technologies) will use ProQuest's taxonomy to classify and organize information in its proprietary RetrievalWare software products. RetrievalWare solutions provide searching across more than 200 forms of text, video, image, and audio information in more than 45 languages.

Entopia, Inc. recently launched a new Software Development Kit that will allow application developers to integrate Entopia K-Bus into any application. The company says K-Bus offers "information discovery" functions, including enterprise search, expertise identification, content visualization, social networks analysis, and content connectivity.

In a recent Forrester report, "The Future of Enterprise Search," principal analyst Paul Sonderegger said: "Search will mature beyond helping people find what they're looking for to helping people understand what they've found. The key is making the most of structure in content and creating it where it doesn't exist."

While enterprise search will continue to be a major growth area, it seems likely we'll also see some consolidation among the companies that compete in this space. At last fall's KMWorld & Intranets conference, more than half of the knowledge management exhibitors identified themselves as being in the "search, taxonomy, and classification" markets. There may be some key technology acquisitions as the enterprise search companies work quickly to integrate all the functionality that customers demand, including entity extraction, advanced linguistic technologies, taxonomies, and classification.

More or Less?

Thomson Gale said it has finished loading more than 500,000 backfile investment reports into the Investext Plus database. This new content, added at no cost to existing subscribers, extends the database backfile to 1982.

On the other hand, sometimes there's less content to report. Thomson Gale also said it has been informed by the British Medical Association (BMA) that the full-text of 29 BMA health titles may no longer be offered through InfoTrac or the Thomson Gale Resource Centers. The BMA decision was effective April 16. Gale will retain BMA data in the backfiles and will continue to abstract and index the titles going forward.

Searchers have always had to keep up with additions and deletions in database content. Now, a similar task confronts users of content-rich Web sites and services. Do you really know what's included? Reports indicate that Reuters is pulling back a lot of the free business news it has made available to some Web sites and portals, such as Yahoo! Finance, MSN Money,, CBS, and Headline feeds on some sites will link readers back to the site. Eventually, some top information will be available by subscription only.

On the other hand, Thomson Financial is replacing Reuters content with content from, a multimedia publisher of business news and a provider of financial information and analytical tools. The Wall Street Journal reported that Thomson was concerned that Reuters' material was going to Thomson's institutional clients. Thomson Financial is teaming up with to develop a new, tailored online news service that will be delivered via Thomson ONE. The partnership will bring together MarketWatch's news coverage by financial journalists and Thomson Financial's proprietary content and market analytics to create a service that's focused on real-time market, industry, and U.S. company news.

The Thomson news service will be available exclusively to Thomson ONE customers as well as to clients of Thomson affiliates. The announcement said that Thomson Financial and MarketWatch together will build an expanded staff and invest resources in a journalistic effort that's 100-percent committed to Thomson ONE news content.

A recent newsletter from research and advisory firm Outsell, Inc. summed it up: "Suddenly, the advantage Reuters, Bloomberg, and Dow Jones had over Thomson by owning news-gathering organizations to complement their financial information coverage is eroded, further fueling the already hot battle among the four companies for supremacy in the large and content-dependent institutional financial market sector. If Thomson and prove they can work well together and the partnership terms and pricing are not cockeyed, this could give Thomson a sliver of advantage in some bake-offs for accounts."

So it's probably not a coincidence that Dialog, a Thomson company, announced that real-time financial and business news and analysis produced by CBS MarketWatch is now available through four of its services: Dialog NewsRoom, Dialog NewsEdge, Dialog NewsEdge Live, and NewsEdge Insight.

Googling Along

It seems impossible to have a month go by without some news from Google. The search giant recently modified and enhanced its home page and results pages, added personalization features and Web alerts, and, most significantly, launched a free e-mail service.

Google Personalized Web Search and Google Web Alerts, both debuting on Google Labs, are designed to let searchers specify what interests them and receive customized results. Google Personalized Web Search uses preferences to deliver results. Searchers can control their level of personalization using a slider and see the results change dynamically as the level changes.

Google Web Alerts provides automatic updates for Web users. After specifying keywords they want to track, users can receive daily or weekly e-mails with links to new Web page results plus top Google News stories that are related to each query. Users can still choose to receive only News Alerts. In addition, Google News now features images in search results and displays thumbnail images of photos that relate to news stories.

Google also announced that it's testing a preview release of Gmail, a free search-based Web mail service with 1 gigabyte (!) of storage capacity per user ( Built on the Google search engine, Gmail can quickly recall any message an account owner has ever sent or received, thus eliminating the need to file messages in order to retrieve them. Gmail automatically groups e-mail and all replies together in the proper context. The service also includes textual ads matched to the content of the displayed e-mail.

Rich Wiggins reported on the news and ensuing buzz about Google's April Fools' Day announcement of Gmail
( The nature and timing of the announcement caused initial doubts of Gmail's authenticity, but over the next few days the coverage focused on the privacy issues raised by the targeted ads.

Google Gets Flak

Within a week, the World Privacy Forum and 27 other privacy and civil liberties organizations had written a letter asking Google to suspend its Gmail service until the privacy issues are adequately addressed. The letter also asked Google to clarify its written information policies regarding data retention and data sharing among its business units (

The organizations voiced concern that scanning confidential e-mail to insert third-party ad content violates the implicit trust of an e-mail service provider. The scanning creates lower expectations of privacy in the e-mail medium and may establish dangerous precedents. Other concerns include the unlimited period for data retention that Google's current policies allow and the potential for unintended secondary uses of the information Gmail will collect and store.

Then, Sen. Liz Figueroa, D-Calif., who called Gmail a "Faustian bargain," said she would introduce legislation to block the service. Some media outlets said the privacy issues were overblown. At press time, there were conflicting reports about whether Google was considering changes to Gmail to placate the privacy concerns. In The Wall Street Journal, Google co-founder Sergey Brin said that the idea was "being batted about."

Interestingly, Wiggins noted that the limited testing of Gmail by Google staff and invited friends means outsiders aren't experiencing it firsthand. He wrote, "It occurs to me that Google made a huge mistake by failing to let members of the press try out Gmail."

Speaking of potential privacy problems, at press time, Amazon had just rolled out the beta version of its new A9 search engine. Its most prominent feature is providing a user's search history. More on this news to come.

Yahoo! Update

Though other search engine news was somewhat eclipsed by all the Googling, there were some other important developments. Yahoo! introduced Yahoo! News Search 2.0, which now lets users search more than 7,000 global news sources in 35 languages, a significant enhancement over the previous Yahoo! News. Other improvements include a new related search feature that offers suggestions for refining queries, more frequent crawls to update the news, and sorting of results by relevance or date.

Finally, if you didn't get enough chuckles on April Fools' Day or you just need a break from serious news, I recommend a recent piece in The Onion, a satirical weekly publication ( Rich Wiggins pointed out an article titled "Yahoo! Launches Soul-Search Engine." Written like a formal news article, the piece supposedly details Yahoo!'s latest foray into the competitive search market. Here's a wonderful "quote" from Yahoo! CEO Terry Semel: "Capable of navigating the billions of thoughts, experiences, and emotions that make up the human psyche, the new Yahoo! soul-search engine helps users find what's deep inside them quickly and easily. All those long, difficult nights of pondering your place in this world are a thing of the past." Hmmm, I wonder how far-fetched this really is?


Meanwhile, specialized search engines, which offer more focused results and specialized content and features than the big guys of search, continue to fill important niches. This is the notion of narrowcasting—narrowing a search to a specific industry or topic. Whether you're looking for legal, biomedical, or scientific information, searching content that has been editorially chosen and is often not reachable by a general search engine will provide faster access to better results.

A classic example is GlobalSpec, the specialized online resource for engineering. It recently launched a new interface; added more powerful search functionality; and introduced a specialized search engine it's calling The Engineering Web, which it says provides "engineering context and relevancy" as well as access to hidden Web resources. The engine searches more than 100,000 engineering and technical Web sites and provides searching of specialized content the company says is not available on any other engine: application notes, patents, material properties, and standards.

In addition to the Web resources, GlobalSpec's proprietary SpecSearch allows users to search by specification more than 60 million parts in 1 million product families from more than 10,000 supplier catalogs. The company built both the search technology and taxonomy and now has 1 million registered users. (See the NewsBreak at

Since engineering is so information-intensive, I guess it's not surprising that this field continues to draw resource-development initiatives. Elsevier Engineering Information announced the launch of Referex Engineering, a specialized electronic reference product hosted on the Engineering Village 2 platform. Referex Engineering draws on more than 300 of Elsevier's book titles to provide engineering professionals with a fully searchable reference database that offers both breadth and depth.

The company says that Referex Engineering is designed on a concept of "layering content" to create the breadth and focus that researchers, professional engineers, and academics require. By layering broad-based handbooks, professional reference works, and how-to guides with specialized monographs and scholarly texts, Referex Engineering has created a foundation of information that allows searchers to quickly find solutions to their reference needs.

Another new research tool allows biomedical and life science researchers to search the MEDLINE database more productively and efficiently. Vivísimo's ClusterMed organizes the long list of results returned by PubMed into hierarchical folders with meaningful categories. This allows researchers to home in on the most relevant results quickly. Vivísimo developed proprietary biomedical knowledgebases and algorithms for the sophisticated text processing of the PubMed records. ClusterMed is licensed to companies on a yearly subscription for local server installation. A demonstration site is available at (See the NewsBreak at

Content in (Ad) Context

Finally, one of the more interesting commentaries I've read recently is by analyst John Blossom of Shore Communications, who wrote about how non-publishers are monetizing content. Wal-Mart, Procter & Gamble, and Ford are among the companies that are providing their own private contexts for print and online content and are launching their own publications for customers. Now, mass-media publishers are finding themselves under the gun to hang on to desirable shelf space and Web clicks. Basically, advertisers are creating their own contexts in which to place ads. Librarian alert! The need to educate readers to bias issues is rising to a new level of urgency.

For the latest industry news, check every Monday morning. An easier option is to sign up for our free weekly e-mail newsletter, NewsLink, which provides abstracts and links to the stories we post.

Paula J. Hane is Information Today, Inc.'s news bureau chief and editor of NewsBreaks. Her e-mail address is
       Back to top