[ONLINE] feature

Reference Resources on the Web

Chris Sherman

ONLINE, January 2000
Copyright © 2000 Information Today, Inc.

Subscribe

There are a number of hybrid reference services on the Web that are something of a cross between a traditional database service and a Web search engine. They often feature proprietary content, and may also include links to other Web resources. All are either free or very inexpensive when compared to services like Dialog and LEXIS-NEXIS.

For this article, I looked at three prominent Web-based online reference resources: Ask Jeeves, Electric Library, and Information Please. I had chosen Answers.com to review as well, but its recent acquisition by Net Shepherd and subsequent product overhaul made a side-by-side comparison with the other resources difficult. See the sidebar on page 55 for a brief discussion of the new service.

To test the services, I posed three queries to each that could be answered with unambiguous, factual responses. The first was an "easy" question, which all services should have been able to answer. The results for this question allowed me to compare and contrast the depth of the results provided by each service. The second and third questions were deliberately "harder," ones that I didn't expect all of the services to be able to answer. The point here was to see what alternatives were offered if no results were found.

Since Ask Jeeves and the Electric Library encourage the user to "ask questions," I searched using both keyword queries and simple, natural language sentences to test the capabilities of the language parsing system of each service.

The test questions were:

QUESTION 1: Who is Ehud Barak?
(The Prime Minister of Israel in July 1999)

QUESTION 2: When was the city of Beijing founded?
(Peking founded around 1122 B.C.; renamed Beijing in 1949, according to the New York Public Library Desk Reference)

QUESTION 3: How many chromosomes do humans have?
(46, in 23 pairs)

Ask Jeeves
http://www.ask.com

Ask Jeeves is an Internet search engine that takes a non-traditional approach to cataloging Web resources. Unlike the other services reviewed in this article, Ask Jeeves does not compile or aggregate proprietary content that directly answers questions.

Instead, the service has built a knowledgebase of about 7 million questions with pointers to millions of resources on the Web that offer answers. In her excellent profile of Ask Jeeves ("Hi AJeevers," DATABASE, June/July 1999), Reva Basch writes: "Strictly speaking, those 7 million questions actually consist of several thousand question-and-answer templates, any of which might have 50 to 5,000 'smart list' items associated with it."

Jeeves employs a simple query form with no refinement or limiting options, encouraging the user to "just type a question and click 'Ask!'" Jeeves responds by presenting the closest matching questions in its knowledgebase. When a question has several possible interpretations, drop-down boxes allow the user to fine-tune the question to more closely match the desired result. Clicking the "Ask!" button next to any of these "results" questions calls up the Internet document Jeeves editors have selected as the most relevant for answering that specific question.

Jeeves also functions as a metasearch engine, querying AltaVista, Excite, Infoseek, WebCrawler, and Yahoo!. This two-pronged approach provides useful alternatives if an answer isn't found in Jeeves' database.

The question-and-answer templates are built by a staff of editors, working in teams focusing on specific content areas. Editors for business and finance, health, and law categories are recruited for expertise in their respective fields. Editors for other categories are expected to be generalists, with strong Internet awareness and research skills.

The editorial team strives to improve the knowledgebase in a number of ways. User interaction is monitored to track which questions seem to provide more relevant results. Editors constantly seek out newer or more relevant Web sites as candidates to replace existing answers. And the team responds to timely events such as breaking news or newly-released books or movies by creating new question-and-answer templates.

Test Results

Results were identical for keyword and natural language queries for the Jeeves knowledgebase, but quite different for the metasearch results from search engines. In all cases, natural language queries produced better results than keyword queries, a somewhat counterintuitive result. When Jeeves could not find the answer to a question, it presented a creditable list of alternatives.

QUESTION 1: Who is Ehud Barak?
Jeeves found no results to this query, instead offering a link suggesting "I think you may have misspelled something." Clicking this link brought up a new question, "Did you mean: Who is..." with two drop-down menus proposing 16 alternate words each, the top two being "Hued Bark." In a like vein, none of the others even closely matched the query. The metasearch results for the query resulted in numerous results from AltaVista, Excite, and WebCrawler that did answer the question, however.

QUESTION 2: When was the city of Beijing founded?
This question provided an excellent example of Jeeves' question-and-answer templates in action. Jeeves proposed answers for five questions, including a city guide, restaurant finder, night life directory, and map of Beijing. It also proposed "extensive historical, economic, and political information about the country China," which linked to the Library of Congress "China: A Country Study" page. Unfortunately, none of these resources provided the answer.

The metasearch produced mixed results. Several documents purported to have an answer, but all were found on personal Web pages, so the results could not be considered to be authoritative.

QUESTION 3: How many chromosomes do humans have?
This question also showcased Jeeves' question-and-answer templates. Jeeves' first suggested question, "Where can I find a concise encyclopedia article on chromosomes," linked to an Encyclopedia.com article from Infonautics (which unfortunately didn't answer the question). The remaining suggested questions all pointed to much more general resources on genetics, and as such were not useful.

Metasearch results were mixed. All of the services queried by Jeeves provided a mix of authoritative and personal pages, so the answer was ultimately found. Curiously, it was easier to find the estimated number of genes in the human DNA sequence in these results (80,000 to 100,000) than the well-documented number of human chromosomes that can be found in any basic biology textbook.

The Electric Library
http://www.elibrary.com

The Electric Library is one of several research-oriented Web sites maintained by Infonautics Corporation (the others are Company Sleuth, Job Sleuth, Encyclopedia. com, and Researchpaper. com).

Unlike the other services reviewed in this article, The Electric Library is a fee-based service. The service is often compared to Northern Light, though the Electric Library uses a subscription model with no transactional fees. It is licensed to more than 15,000 schools and libraries, and has more than 80,000 individual subscribers. Subscriptions are available on a monthly ($9.95) or annual ($59.95) basis.

The Electric Library Personal Edition is also unique in that its database contains only copyrighted content. Licensed content includes material from 400 publishers, with well over 1,000 titles, according to Bill Burger, Vice President, Content and Media Services of Infonautics. Segregated into six categories, the Electric Library contains over 5.5 million newspaper articles, nearly 750,000 magazine articles, 450,000 book chapters, 1,500 maps, 145,000 television and radio transcripts, and 115,000 photos and images. Fully 95% of the content in Electric Library isn't available on the Web, at least for free, says Burger.

"We update the database every day," says Burger, "it's constantly refreshed." Content providers regularly send new information--sometimes in real time, in the case of content provided by wire services.

Lists of the sources of the content are easily available. Hyperlinks on the search form display sources arranged alphabetically. Each media type's source list also provides both text and image links for the other media types. This transparency provides a high degree of confidence in the reliability and validity of the materials provided by The Electric Library.

The Electric Library's search form is simple yet elegant. The search form allows you to enter a question in natural language. You can limit your search to a specific type of media by checking or unchecking boxes next to the six media types. Searches can be further limited to specialized content categories through the use of a drop-down menu selector.

Clicking "search options" provides additional refinement tools. You may select natural language or Boolean search, publication date range, or limit your search by bibliographic information, such as author, title, or publication.

Search results are presented in groups of 30. Descriptions include an icon indicating media type, the title of the document or image, and a relevancy score. The source and author of the document, publication date, and size are also provided. And, as a nice touch for children using The Electric Library for homework, a reading level is indicated.

You can also choose the "refine search" option, which displays a search form with the additional search options noted earlier, and two other controls. The first is the "search power setting," which controls language expansion by the natural language parser. "High" (the default) performs a great deal of language expansion, while "Low" searches only for the exact words you enter in the search text box. The second control allows you to change the number of results you see displayed, up to 150 in increments of 30.

Clicking on the title of a document displays the full-text of the selected document. Keywords are underlined and boldfaced in the document, and there's an option to go to the "Best Part" of the document, which is useful for finding the core idea in longer documents.

An interesting feature unique to the Electric Library is called "Recurring Themes." Recurring themes include people, places, and subjects extracted from the documents in your result set. When a person, place, or subject theme occurs in significant numbers, it becomes a "Major Theme," with related "Other Themes." Themes are clickable links that will organize search results by that theme.

Test Results

The Electric Library returned different results for keyword and natural language queries. Overall, keyword queries provided more relevant results for these test questions than natural language queries. Also, results varied significantly when search refinement tools were used. As a rule of thumb, a searcher should definitely use these refinement tools when searching the Electric Library for best results.

QUESTION 1: Who is Ehud Barak?
More than 30 results were found. For this query, the results page displayed a tip: "All documents have a score of 100. For better results, try to provide more search criteria." Results included a variety of newspaper articles, including many from the Jerusalem Post. Also included were magazine profiles from Time and Newsweek International, and a National Public Radio interview with Prime Minister Barak. All 30 results were relevant.

QUESTION 2: When was the city of Beijing founded?
The Electric Library fared poorly on this question. Neither natural language nor keyword queries returned relevant documents. Only when content was restricted to "books and reports," search was limited by the specialized content "history," and the Power Setting was set to "low" did results return an appropriate answer. This was from the Columbia Encyclopedia, the same resource used by Information Please.

QUESTION 3: How many chromosomes do humans have?
The natural language query returned no relevant results in the top 30, whereas the keyword query provided the answer in the first document of the results list.

Information Please
http://www.infoplease.com

Information Please is the online service owned by Information Please LLC, an almanac and reference database publisher that's been in business for more than 50 years. Its most famous product is the Information Please Almanac, first published in 1947 as an outgrowth of the popular Information Please quiz show which ran on NBC from 1938 to 1952.

The quiz show evolved from a "stump the expert" format into a forum for curious people to find answers to difficult and often obscure questions. This led to a tradition within the company of providing accurate information explained in a clear, easy to understand format.

Information Please online combines the contents of an encyclopedia, a dictionary, and several almanacs replete with statistics, facts, and historical records. The information in its database is continuously updated and refined by an internal staff of editors and researchers. Editors come from a broad variety of backgrounds, including major publishers and academic institutions, according to Elizabeth Buckley Kubik, Vice President and General Manager of Information Please LLC.

Data in the Information Please database is maintained in SGML format, and the search and retrieval software is proprietary. This allows the system to be quite linguistically rich. Natural language and keyword queries generally provide similar or identical results.

Information Please looks and feels more like a portal than any of the other services reviewed here. It combines a search form with a directory-style collection of topical links. This makes browsing for content quite easy. Information Please's Kubik recommends the service as a good starting place for students, or searchers looking for specific factual information.

The search form appears at the top of every screen. The only search limiting or refinement capability is provided by a drop-down box that lets you select a specific almanac, biographies, a dictionary, or encyclopedia.

Search results display document titles, the source, section, and category where the document resides, and a brief description of the document.

Information Please has a nifty function that lets you highlight any word or phrase on a page and click a "Hot Words" button to perform a search on the highlighted area.

Test Results

Information Please successfully answered all three questions. Results were identical for keyword and natural language queries.

QUESTION 1: Who is Ehud Barak?
Fifteen results were found. Eight of the top ten results were encyclopedia or dictionary entries, all dealing with the Biblical characters Ehud and Barak. The correct answer was found in the fifth result, an almanac entry on Israel. However, no additional information other than Barak's official title was offered.

QUESTION 2: When was the city of Beijing founded?
More than 100 results were found. Top ten results included entries from the dictionary, encyclopedia, almanac, and a "spotlight article" on Tiananmen Square. An answer similar to the New York Public Library Desk Reference was found in the third result, an encyclopedia article on Beijing.

QUESTION 3: How many chromosomes do humans have?
More than 100 results were found. Top ten results included dictionary, encyclopedia, and several almanac entries. The second result, an encyclopedia article on chromosomes, contained the correct answer.

CONCLUSION

Each of the three services reviewed here has strengths and weaknesses, and aren't directly comparable to one another. The choice of which to use should be driven by user need. With its strong natural language parser and question-and-answer template structure, Ask Jeeves is useful for complex questions, and is a good choice for searchers that lack Boolean or other searching skills.

Electric Library is an excellent choice for a serious researcher in need of timely content from a wide array of otherwise unavailable sources. And Information Please is an excellent tool for students and other researchers, as an authoritative source of facts and pointers for further investigation.


Answers.com Acquired by Net Shepherd Inc.

As part of this review of Internet reference resources, I also tested Answers.com. At the time of writing (mid July 1999), Answers.com was organized in a question/answer format. According to the "About Us" section of the Answers. com Web site, "Humans answer your questions. Our database has been built by people like yourselves asking our human researchers questions." As such, it was not an exhaustive reference resource, but rather one that was built in somewhat of an ad hoc fashion in response to user demand. The service fared poorly when compared to the others reviewed. The layout of the site was somewhat awkward, and Answers.com's search engine didn't work well. In general, Answers.com seemed more like an interesting collection of fun facts and trivia than a useful research tool. However, just as I was completing this article, I learned that Answers.com had been acquired by Net Shepherd Inc. Their plans for the service are both intriguing and exciting. "Improvements in the home site of Answers.com will be extensive," says Bill Fogg, CEO of Answers.com.

For starters, rather than relying on a "small staff and a big bunch of encyclopedias," answers will be provided by a vetted network of "e-Explorers," according to Peter Hunt, Net Shepherd's Vice President of Corporate Affairs. E-Explorers are equipped with proprietary resource discovery tools, and are organized into an online community called The Internet Explorers Society. Society members cooperate to review, classify, and rate Internet content. Membership is by invitation, and open only to experienced instructors or librarians, who must also complete a training course. Members are compensated using a "Points of Discovery" system that translates directly into cash rewards.

Net Shepherd is using e-Explorers to work on a variety of projects other than Answers.com, including visiting Web sites to extract business intelligence, categorize content, and so on. E-Explorers use a customized browser called a "Member's Journal" that includes all of the tools and functions needed for each project. "Included in the Member's Journal are communication windows that we can use to 'push' messaging and content to selected members," says Ron Warris, Net Shepherd's Founder & Vice President Technology. "Our intent is to use these 'push windows' to broadcast questions that have been asked on the Answers.com site to members who are currently online and participating in other projects. If a member sees a question that they believe they know the answer to or are willing to do a bit of research for, they will be able to click on the question and will be immediately presented with a response form that they can use to submit an answer," says Warris.

If a question has been broadcast for a preset amount of time and no one has 'claimed' it, it is forwarded to a community discussion area for debate among Internet Explorer Society members. If debate doesn't answer the question, it is forwarded to a select group of members who research it and post an answer. Net Shepherd is also creating Neural Network technology to help automate the process of identifying potential "domain experts" inside the e-Explorer community. As the community grows and the system learns more about each member's knowledge and skills, questions can be more precisely pushed to members with the highest probability of being qualified to answer.

Quality control is a paramount goal of the new service. Net Shepherd is developing an integrated Quality Management System (patent pending) that consists of both computer operated and human review processes, according to Warris. With the potent combination of real-time help from certified experts, and a quality control system that assures consistent, authoritative responses to questions, the redeployed Answers. com seems certain to become a useful part of any serious Web searcher's toolkit.

--Chris Sherman


Chris Sherman (websearch.guide@about.com or csherman@searchwise.net) is the About.com Guide to Web Search, http://websearch.about.com. He holds an MA from Stanford University in Interactive Educational Technology, and has worked in the Multimedia/Internet industry for two decades, currently as President of Searchwise.net, a Web consulting firm.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.
Comments