Most search engines
provide the number of
results for every search.
However, the numbers are
not always accurate.
AltaVista also has long
had difficulty counting and
displaying a consistent
...Watch out for inaccurate
counting on niche Internet
databases, multiple search
engines, and other specialized
Problems with processing
the search syntax can result
in strange results as well.
ONLINE, March 2000
According to Aldous Huxley, "Consistency is contrary to nature, contrary to life." Far too often, it is also contrary to practices of the Internet search engines. The databases, search interfaces, indexing, storing, and processing are all computer-based functions. In general, a computer's strength is consistent processing, and in most cases will do the exact same task when given the exact same input.
Consistency in search processing certainly makes it easier on the information professional. When an information system responds consistently, it is easier to know when a search has been comprehensive and when to move onto another system.
Unfortunately for the consistency connoisseur, many of the less well-known Internet search tools are hastily constructed. Meanwhile, the top Internet search engines have had extensive development of their interface, but with the general aim of providing a few relevant answers quickly to almost any kind of search. In neither case is search consistency necessarily a high priority.
COUNTING CONSISTENCYAt their most basic level, computers excel at counting. Yet some Internet search engines do not. Several search engines have for a time stopped counting results at all. An advanced search on Lycos gives no total number, even though a regular Lycos search does. For a brief time, Excite stopped reporting the total number of hits, but then it put the numbers back in, although at a much smaller font size.
Most search engines provide the number of results for every search. However, the numbers are not always accurate. First of all, they have to figure out what to count. A portal that gives results from several databases could count the hits from each one or the total hits from all. Excite does not count the results from its directory categories or its general information, but it does provide separate counts for its Web results (its search engine database) and its news database.
Another complicating factor appears when a search engine clusters by site. When several pages matching the search criteria are all grouped under one record for the site, should the search engine count it as one hit or several? The simple approach would be to count sites as a separate number from pages, but the search engines that do cluster take several different approaches.
Infoseek just counts the total number of pages and reports that number, not the number of sites. Northern Light also counts the total number of hits and sometimes will report it as "115 items in 48 sources," where sources can be either a Web site or a Special Collection publication.
HotBot Loses CountHotBot has much greater problems with counting its clustered results. Since it started clustering results, the reported number of matches appears to be the approximate number of sites, not pages. And if the number of matches is more than a few hundred, the reported number of matches is always a multiple of ten. However, the real problem with HotBot is that the reported number of matches seems to have no connection with what you can actually retrieve. And the number can change as you move to the next page. For example, a HotBot search on hypercarbia with display set to 100 hits reportedly found 200 matches (a multiple of ten). After clicking on the next page, HotBot then reported and displayed only 120 records. Were the other 80 hiding under the site clustering? To turn off site clustering, I tried the same search limited to .com, .edu, and .org sites. With that limit, HotBot then reported 150 matches. Clicking on the next page changed the reported number to 120 matches and displayed numbers 101-123.
AltaVista AltercationsAltaVista also has long had difficulty counting and displaying a consistent count, and for that reason will sometimes say that it has "about" X number of hits. This inconsistency (and others) seems to derive from AltaVista's programmed preference for speed over comprehensiveness. Rather than waiting to finish processing the search, AltaVista will deliver partial results, especially on large or complex searches.
Theoretically, reloading the search or going to the next results page will give AltaVista more time to process the search and possibly retrieve more results. In reality, it may find more or less. This is also why the numbers can differ when two people do exactly the same search. The search will time-out and deliver partial results before it is finished.
AltaVista also clusters results by site, at least on its Simple Search. For example, a search on the phrase
While multiple databases, site clustering, and time-outs are some reasons that search engines may have difficulty counting, this certainly does not explain all counting inconsistencies--sometimes it is just strange programming. For example, Ancestry.com, which provides access to hundreds of genealogical databases including the Social Security Death Index, cannot even count to zero. Try a search on any non-existent name to see the statement that you are "Viewing records 1-0 of 0." How about a simple "no records found" message?
While HotBot, AltaVista, and Ancestry have difficulties in reporting an accurate number of results, many other search engines do manage to count quite accurately. But watch out for inaccurate counting on niche Internet databases, multiple search engines, and other specialized search tools.
PROCESSING PROBLEMSInconsistencies go beyond the inability to count. Problems with processing the search syntax can result in strange results as well. Truncation, field searching, and even basic Boolean processing may not work as expected. As with the counting, some search engines process the searches consistently, but others do not. Sometimes inconsistencies pop up unexpectedly. Take Google! for example. It has few advanced search features beyond phrase searching: the
HotBot Single-Word SearchA single-term search is a fairly basic search. HotBot offers drop-down menu options for searching in several different ways, including all the words, any of the words, exact phrase, and Boolean phrase. On a single-word search, logic says that each of those options should find the same results. Unfortunately, they do not. The table shows the results for searching the same single term with each of the four options.
While HotBot cannot count higher numbers, in each of these instances, the number reported exactly matched the number of records displayed. I double-checked all of these numbers, even using a different Web browser to be sure the results were not coming from the cache. Other Inktomi search engines that offer all four options could consistently find the same number, even at Canada.com, which also clusters results by site like HotBot does.
Note that there is not even any consistency as to which of the four search options displays more records. In these examples, each search had two options that found the most records, but they were not the same two for every search. On other searches where I have tried this, the difference in numbers has been even greater.
AltaVista's TurnAltaVista has shown similar examples of processing problems. Many of these can, perhaps, be explained by AltaVista's time-outs. The incomplete processing will certainly cause inconsistent results on a number of searches. Yet there are processing inconsistencies beyond that as well.
One example is AltaVista's ability to search diacritics. Input a search term without diacritics and AltaVista is supposed to search for matches with or without diacritics. Use the diacritics in the search term and only exact matches should result. So, a search on the French term
A similar example involved AltaVista's case sensitivity. Searches using all lowercase letters are supposed to match any case or mixture of cases. Including a single uppercase letter in a search term is supposed to require an exact match of case. Thus, qwerty would match qwerty, QWERTY, and qWeRtY, while qWeRtY would only match qWeRtY. Yet for awhile, a search on Fe (as in Santa Fe) found more hits than fe, which should have found more.
Yet not even AltaVista's inconsistencies are consistent. The Fe problem only lasted for a few days, back in May 1999. Then it suddenly started working as it should, with a search on fe finding more than a search on Fe.
Multiple search engines run into similar difficulties. Because multiple search engines pull their results from other databases, they must constantly keep up with any changes to their component search engines. The multiple search engine must parse the results list, stripping out links to the rest of the search engine site, advertisements, and any other extraneous material.
Thus, the algorithm that worked yesterday may fail today. Sometimes ads or internal links get included by the multiple search engines as regular results. At other times, specific search engine results are missed altogether. Watch especially for claims of hits from Northern Lights and HotBot, and double-check directly on those search engines to be sure their hits are included.
WHAT TO DOSome inconsistencies are due to a temporary situation, while others continue to recur. It makes for a challenging search environment. The inconsistencies are certainly frustrating, but they do not mean that you should stop using a specific search engine just because of the problems. Some of the most inconsistent engines still provide content, search features, and database records not available in any other sources.
Just be prepared for inconsistent behavior, know some of the ways in which the search engines do fail to deliver as promised, and plan search strategies accordingly. To help in keeping track of which search engines have particular problems, check on the inconsistencies section of my Search Engine Showdown site at http://notess.com/search/. It documents current and past inconsistencies and provides an opportunity for others to report inconsistencies. Sharing and tracking details of these problems can help us better understand the results that the search engines deliver. As to the inconsistent search engines, Walt Whitman said it best.
Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes).
Comments? Email letters to the Editor at firstname.lastname@example.org.
Copyright © 2000, Information Today, Inc. All rights reserved.