Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > December 2003
Back Index Forward

Information Today
Vol. 20 No. 11 — December 2003
NewsBreak Update
Updates on Projects, Partnerships, Improved Services, and More
by Paula J. Hane

I recently returned from the Internet Librarian conference. As usual, it offered a useful mix of practical and visionary presentations along with excellent networking opportunities. Though the event is organized by ITI, the publisher of Information Today, my roles were as attendee and reporter.

The conference—in the ever-popular Monterey, Calif., location—drew more than 1,100 participants despite continuing economic woes and curtailed travel budgets. I attended excellent sessions on e-resources and digital libraries, searching and search engines, content management strategies, technology trends, and more. Look for coverage in the January 2004 issue of Information Today.

Definitely Too Much Info

Researchers Peter Lyman and Hal R. Varian from the University of California­Berkeley's School of Information Management and Systems have released the results of their study "How Much Information? 2003" ( According to the report, "Newly created information is stored in four physical media—print, film, magnetic, and optical—and seen or heard in four information flows through electronic channels: telephone, radio and TV, and the Internet."

The study found that we produced about five exabytes of new information in 2002. Of this, 92 percent was stored on magnetic media, mostly hard disks. According to the report, five exabytes of information is equivalent to the information contained in a half million new libraries the size of the Library of Congress' print collections. And this is just in 1 year. I knew there was a reason I felt so overwhelmed.

The report's executive summary provides some interesting statistics. For example: "In 2000, we estimated the volume of information on the public Web at 20 to 50 terabytes; in 2003 we measured the volume of information on the Web at 167 terabytes—at least triple the amount of information. The surface Web is about 167 terabytes as of summer 2003. BrightPlanet estimates the deep Web to be 400 to 450 times larger, thus between 66,800 and 91,850 terabytes."

Interestingly, the authors say they view the report as a "living document" and intend to revise it based on comments, corrections, and suggestions.

The bottom line is that we're drowning in data. (Some days it feels like spam alone could do me in.) The severity of this situation, especially for businesses, makes content management, knowledge management, and search/
browse/discover tools ever more critical for handling it all. Solutions from companies I discussed in last month's column—those that provide information-extraction tools for unstructured content and those that deal with indexing, taxonomies, classification, clustering, content integration, information discovery, relationship analysis, etc.—will be more in demand.

We're also seeing some welcome interface developments that incorporate visualization and other techniques to let users work within more contextual information spaces. A number of the presentations at Internet Librarian related to these issues and solutions.

Getting into Books

Amazon recently introduced "Search Inside the Book," a full-text search feature that lets users delve directly into the content of books. This new service taps more than 33 million pages from more than 120,000 nonfiction and fiction titles provided by 190 publishers. In November (, Barbara Quint reported on publishers' reactions and explained the copyright issues raised by authors. Amazon indicated that a week after launching the new service, sales for full-text searchable books outpaced those for other books by 9 percent.

Quint said: "Full-text book searching has been available for some time from such services as ebrary, OCLC's netLibrary, even Project Gutenberg and other public-domain book sites. And those are just the more current and Web-oriented examples. Earlier examples in traditional online also exist. However, to offer it at this scale with such a promise of future growth and at no charge to the user or the user's institution, does promise a new level of access for 'library-quality' material—not to mention, more revenue for Amazon."

OCLC, Google

One development of great interest is OCLC's recent announcement that its Open WorldCat pilot project will begin testing access to WorldCat records through Google. OCLC is working with selected Web sites, including the Google search engine, to provide links to the records of WorldCat libraries. This will ultimately help users find local libraries that have the items they want. The project is using a 2-million-record subset of the most popular and widely available books from the more than 53 million records in WorldCat. OCLC will analyze Open WorldCat using feedback, surveys, and statistics. In June 2004, OCLC will decide whether to expand, continue, or end the project.

The goals of the project are to expand the visibility—and utility—of libraries and increase the quality of materials that are accessible from the Web. In an October NewsBreak
(, Barbara Quint noted that the expansion to include OCLC's records clearly fits Google's mission statement to "organize the world's information and make it universally accessible and useful."

More Google Activities

Google Labs has released Google Deskbar, a search application experiment that lets PC users perform Google searches at any time from any application without opening a browser. Google Deskbar is a free software download (from that appears as a search box in the Windows taskbar. I haven't had time to test it out yet.

Earlier this year, I downloaded Quick Search Deskbar, a similar tool from HotBot (which let me search using Google and three others). Gary Price of ResourceShelf says that this offering is actually more robust than Google Deskbar. While I thought the HotBot application offered some very handy shortcuts (and I liked its one-click access to the Online Crossword Dictionary), I found that I just forgot to use it. I was frequently in my browser anyway, so it was just as easy to stick with my regular search habits. I have the Google Toolbar loaded in my browser, and I use that quite often. Yes, we are very much creatures of habit.

The Google Toolbar, by the way, was the recipient of the recently announced Association of Independent Information Professionals' 2003 Technology Award. Google says that Toolbar and Deskbar are complementary products that each accommodate a particular search need.

Microsoft Update

Microsoft hopes to help users change their habits and get them to stay with the familiar Microsoft Office applications—hopefully the upgraded Office 2003 package of products. The company's goal is to enable users to easily access, integrate, and utilize information from diverse sources. The list of information providers that partner with Microsoft to offer search connectivity from within the Office Research Pane continues to grow. The companies that provide services from within applications like Word and Excel include Factiva, Gale, Alacritude (eLibrary), LexisNexis, Ovid, and Elsevier.

OneSource Information Services introduced several new business-intelligence modules that deliver OneSource information—such as company profiles, industry reports, executive details, news, and financial data—directly into Microsoft Office System programs. The solutions include the Catalyst/Account Intelligence Module for Microsoft Office Word 2003 and the Catalyst/Financial Analysis Module for Microsoft Office Excel 2003. OneSource is also supporting the Research Task Pane within Microsoft Office.

Microsoft recently announced that it has teamed up with EDGAR Online, Inc. EDGAR Online's secure XML Web service will transmit XBRL (eXtensible Business Reporting Language) financial-statement data from EDGAR Online Pro to Excel 2003 through the Office Solution Accelerator for XBRL. This will allow investors and analysts to use EDGAR Online's financial information for analysis directly on their desktops. The companies expect this to be available in the first quarter of 2004.

According to Joe Wilcox of Jupiter Research, Microsoft is trying to turn Office, like Windows, into a platform onto which developers and businesses build other programs or custom applications. The browser is becoming less important as its functions are integrated elsewhere. And The Wall Street Journal recently reported that Microsoft's new "operating system due out in 2005, code-named Longhorn, is expected to help users simultaneously search the Web, their own hard drives, and data on corporate networks." These are certainly developments to monitor closely.

By the way, while Google prepares for its initial public offering of stock, unconfirmed reports indicate that Microsoft and Google have discussed a partnership, merger, or even a possible takeover of Google. At press time, the companies' executives weren't talking, but the media was abuzz with speculation and commentary. Some media outlets said that the Google Deskbar was Google's answer to cutting out the browser and challenging Microsoft.

Alacritude Chooses FAST

I reported in my September 2003 column that Alacritude was changing its focus from content to helping folks conduct more effective online searches. Now, the company announced that it has selected FAST Data Search to power the search functionality of its online research services eLibrary and

Patrick Spain, chairman and CEO of Alacritude, said: "In addition to increasing the access speed and relevance of the tens of millions of proprietary newspaper, magazine, and journal articles in our database, our services will be significantly enhanced by FAST's powerful alerting and clustering features. This is just the first step in a complete retooling of our online research services, which we plan to complete early next year."

In August, LexisNexis integrated FAST Data Search with its LexisNexis Total Search. FAST has now amassed a noteworthy list of customers, including, Reed Elsevier, Reuters, and T-Online (Deutsche Telecom). In addition, a substantial number of former AltaVista enterprise customers have renewed agreements or have begun migrating to FAST's technology. FAST purchased AltaVista's enterprise search technology earlier this year.

FAST recently reported some impressive financials. Its third-quarter 2003 revenues reached $11 million, an increase of 15 percent over the second quarter. And "year-to-date revenues grew 18 percent from the same period last year, as new business grew 60 percent for the year."

Bravo Vivísimo

Another company whose technology helps deliver more effective search results is Vivísimo, which uses clustering to organize results into folders or categories. InfoSpace, Inc. announced that it has selected the Vivísimo Clustering Engine for deployment on its Web properties "to enhance the user search experience." The clustering feature is now available across InfoSpace Search & Directory's Web metasearch properties, which include Dogpile, WebCrawler, and MetaCrawler.

Fortune 500 companies, government agencies, and publishers also use Vivísimo. Cisco Systems recently licensed the Clustering Engine and Content Integrator to complement the Google Search Appliance, a tool that serves thousands of Cisco engineers. According to Vivísimo, the Content Integrator metasearches collections on the Appliance through a single query. The Clustering Engine then organizes the combined search results into folders.

Clustering and folders are of course not unique to Vivísimo. Northern Light and other search engines also provide results folders. However, Vivísimo claims that its technology is different: "Unlike solutions that require huge investments in taxonomy-building or categorization, Vivísimo's Clustering Engine organizes search results into folders on the fly, without requiring any pre-processing of source documents." Vivísimo, headquartered in Pittsburgh, was founded in June 2000 by Carnegie Mellon University computer-research scientists.

ERIC Changes

In April, Barbara Quint reported on proposed changes to the ERIC database
( ERIC has operated through a network of 16 subject-specific clearinghouses that are responsible for acquiring, selecting, indexing, and abstracting materials in their area of interest for inclusion in the database. The clearinghouses have provided information in response to requests by mail, phone, and e-mail. Responses typically included a short list of citations from the ERIC database, full-text articles, and appropriate referrals. Unfortunately, the changes are now imminent.

The following notice was posted on the ERIC site in early November:

Changes Coming to ERIC December 19, 2003

ERIC will begin a transition in late December as a new U.S. Department of Education contractor develops a new model for the ERIC database and services. ERIC clearinghouses' Web sites, including AskERIC, and their toll-free telephone numbers will close on December 19, 2003. As of that date, you will be able to use this Web site to:

• Search the ERIC database

• Search the ERIC Calendar of Education-Related Conferences

• Link to the ERIC Document Reproduction Service (EDRS) to purchase ERIC full-text documents

• Link to the ERIC Processing and Reference Facility to purchase ERIC tapes and tools

If you have other ERIC bookmarks, we suggest you change them to

For the latest industry news, check every Monday morning. An easier option is to sign up for our free weekly e-mail newsletter, NewsLink, which provides abstracts and links to the stories we post.

Paula J. Hane is Information Today, Inc.'s news bureau chief and editor of NewsBreaks. Her e-mail address is
       Back to top