[ONLINE]
on the net
photo Greg Notess
Reference Librarian
Montana State University












Now that online searching

has permeated today's

Internet culture, there are

plenty of companies

working on new

approaches to searching

the Net







Up and Coming Search Technologies

ONLINE, May 2000
Copyright © 2000 Information Today, Inc.

Subscribe

Search technology is big business on the Net. The most highly visited sites often started off by helping people find something on the Web. The search engines have made their fortunes in advertising while attracting their large audiences by offering free searching of large databases. Meanwhile, large and small businesses, government agencies, and non-profit organizations have built large Web sites as they discovered the opportunities that the Net offers for easily publishing large amounts of information. Companies now offer product catalogs, customer support, documentation, investor information, press releases, and more. As these sites grow, adding a local search engine and indexing all the documents on their own site has also become an essential part of an organization's Web site.

Now that online searching has permeated today's Internet culture, there are plenty of companies working on new approaches to searching the Net. With the technology economy booming, they have a definite financial incentive. The successful search engine and portal companies all claim great success with their product. Meanwhile, the newer companies all cite general discontent with how difficult it is to find things on the Web. Then they pitch their product as a solution to the problem.

The truth lies somewhere in between the extremes. Some kinds of information are incredibly easy to find on almost all search sites. Other questions cannot be answered at all using just Internet resources. Some searches will continue to be extremely difficult. The new technological approaches may be able to help with some kinds of searches. If nothing else, they can help give us greater insight into the search process.

MEANING-NARROWING TECHNIQUES

One promising new technology involves helping the user to specify the meaning of their search terms. Some aspects of this approach have been in use for years on certain search engines. Excite suggests additional terms to add to the search. AltaVista, HotBot, Go, Snap, and AOL Netfind all offer suggested searches that can help target the search. But the newer technologies go beyond that.

Oingo offers what it calls a "meaning-based search." Currently, its site uses databases from the Open Directory and AltaVista. But it adds a new layer to these databases. Try a search on book and Oingo offers the option to limit the search to the following meanings, among others:

book (publication)
record, recordbook
script, playscript (dramatic work)
financial record
daybook, ledger
book (product)
book of the Bible
town in Louisiana

For English terms, it can be a great way to see alternate uses of the term. How well does it work? Using the preceding example and choosing script, playscript (dramatic work) as the meaning, the Open Directory hits were successful in focusing the search on dramatic scripts. However, the AltaVista results were almost all related to computer scripts, not dramatic works.

Trying a more specific search with booking a one woman play made Oingo respond with alternative meanings for booking, woman, and play. Even after choosing the specific meanings, none of the results was very helpful. For the information professional, Oingo may offer some additional meanings to consider, but most will probably take the ideas and search directly on other search engines. For the general public, Oingo can provide a way of narrowing their search more effectively.

Simpli.com demonstrates a similar strategy with a different implementation. Simpli.com provides one field for entering a query term, and then in the drop-down box to the right, it supplies several meanings.

Using book as a search term, Simpli.com offers choices for several meanings, including "No meaning necessary" and "Enter new meaning." The latter choice allows searchers to add their own meanings to use in the search.

Both Oingo and Simpli.com are building technologies that general searchers can use to more effectively target their searches. How these technologies may be integrated into the standard search engines or if they will develop their own traffic remains to be seen. (Editor's Note: For more on linguistically-based search technology, see Sue Feldman's article, "Meaning-Based Search Tools: Find What I Mean, Not What I Say," on p. 49)

LARGER, FASTER INDEXING

Most Web searchers often retrieve too many hits and require ways to narrow the results to find something relevant to their particular information need. On the other hand, with so much of the Web untouched by the search engines, it is good to see some efforts at indexing more of the publicly available, static Web pages.

AltaVista announced a major size increase last fall with plans for further increases in size. Northern Light has consistently been enlarging its database, breaking the 200 million barrier last fall. Then this past January, Fast Search and Transfer announced a major size increase to more than 300 million. To include that many records, Fast actually crawled over 700 million pages. After removing duplicates and spam, the size was reduced by more than half.

Shortly after Fast's new 300 million database launched, Inktomi announced that it had spidered over one billion pages (see http://www.inktomi.com/webmap/). Unfortunately, that database is not directly searchable. Instead, they pull something over 100 million records from the WebMap database which then can be customized for Inktomi search engine partners. The end result is that searchers can only retrieve records from a portion of the database.

Meanwhile, new search engines are waiting in the wings. WholeWeb.net has announced plans to have indexed one billion pages by June 2000. Still under development, WholeWeb.net anticipates a publicly searchable site available by April.

With these kinds of numbers being thrown about, what does that mean for searching? Access to a greater number of records means that searchers will find pages that had been previously hidden--as long as they use distinctive search terms, phrases, or combinations. It also means that if search engines crawl 700 million pages to find 300 or one billion to find 100 million pages, then there are plenty of pages that are being excluded. While the majority may well be duplicates or spam, it is important to be aware that search engines do exclude large numbers of pages intentionally.

In addition to these increases in size, both Fast and WholeWeb.net plan on offering these very large databases to the general Internet public and continue to maintain very speedy response times. Fast's versions at AlltheWeb.com and on the Lycos advanced search have so far lived up to that promise. We will have to wait and see if WholeWeb.net can also deliver such speed.

The other search engines that plan to increase the size of their databases will also need to consider how to continue providing results quickly enough, in Web terms, to keep searchers coming back for more. In addition to speed, the new larger databases will also have to prove themselves in relevance, duplicate removal, freshness, and consistency. Even the older search engines that make major increases in size will need to be reevaluated in each of these areas.

ALTERNATIVE SEARCH APPROACHES

Mining the market of the disenchanted search engine users are some alternative approaches to finding information that use tools other than a traditional search engine or directory. One alternative can be seen at the HotLinks Guide (http://www.hotlinks.com). HotLinks provides the ability to store bookmarks on the Web for portability of bookmark files between home, work, and travel. In addition, users can opt to share their bookmarks publicly.

With over three million links, this mammoth collection of bookmarks makes for an interesting search tool. Instead of the usual database built by automated software programs crawling the Web, the HotLinks Guide offers a searchable database of the bookmarks of hundreds of other Web users. That means that many links will be dated or inaccurately labeled. But it also provides a fascinating view into the linking habits of other online users.

And then there are search services that offer personalized answers, like ExpertCentral (http://www.expertcentral.com) or LookSmart Live. More multiple search engines and desktop multiple search engines are appearing. Relying on the other databases, they add sorting, storing, and checking features.

New alternatives will continue to appear. Meanwhile, the search engines will also continue to work at improvements. Adding the ability to define meaning of search terms, as Oingo and Simpli.com do, may be one path. Creating larger databases that can deliver results quickly is another.

As the technologies emerge, some will fail while others become a routine part of search engines. However, do not expect even the new technologies to completely replace the search engines. They will continue to be a part of the searcher's toolkit.


Greg R. Notess (greg@notess.com; http://www.notess.com/) is a Reference Librarian at Montana State University.

Comments? Email letters to the Editor at editor@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2000, Information Today, Inc. All rights reserved.
Comments