When a search service
proudly touts its billion-page
index, can we get excited?
Yes and no.
ONLINE, September 2000
W hen is a billion too much? Not when it's cash, chocolate bars, or cans of beer. How about URLs? Possibly. Some Web search engines are on the rampage to index as much of the Web as they can, and while doing so, often heroically mention how they're providing "the most relevant and comprehensive results." We see the words "relevant" and "comprehensive" coupled together so often it seems we might begin to think we can't have one without the other.
Quantity Versus QualityIn late June, Google announced that it indexed over one-billion URLs--560-million full-text indexed Web pages and 500-million partially indexed URLs. How can a database of this size provide, as Google says, "a quality search experience?" Of course, there's a cultural and social significance to the self-publishing boom that the Internet has enabled--and it's tempting to want it all. But how much of this content is really worth (re)searching on a broad scale? Certainly not all of it. This is evidenced by search services like AltaVista, Go, Excite, and others that place a cap on viewable results. There might be millions or even thousands of results, but you'll only see 200 to 1,000 of them. Rest assured, the most relevant results are the ones at the top of the list and, indeed, the number of relevant results will increase as more pages make it to the Web and are indexed by spiders and diligent directory editors. Yet we're still faced with massive amounts of information that just sits out there, eventually creeping into the depths of search results lists.
The ratio of good sites to junk has increased and the wave of publishing on the Web continues at a maddening pace. Search functionality, however, is tolerable at best, especially in the face of the geometric progress of Web page generation. So when a search service proudly touts its billion-page index, can we get excited? Yes and no. It's exciting that indexing technology and power is making progress and that a huge conglomeration of information is being catalogued. On the other hand, the volume doesn't really make searching any easier--your search may have scoured the full billion-page index, but precision is limited.
One Billion (+ or -)Aside from the cap some search services put on results lists, a pure billion-page index is still out of reach. Google's claim to one billion pages falls short because we know that only 560 million of them are full-text indexed. The other half-billion are not fully indexed yet. Inktomi's GEN3 database also offers a half-billion pages, which is where we seem to remain for the time being. Of course, it's only a matter of time until the one-billion mark is officially reached--giving us double the currently available Web pages on the open Web, which doesn't include the valuable invisible portion.
So the race is on. And when a search service finally does catch up with the entire publicly-available Web, will that database be the exclusive destination of searchers everywhere? I doubt it. As we already know, one Web search service alone can't produce the precision we need for 500-million pages, let alone one billion.
Letters to the Editor should be sent via email to firstname.lastname@example.org.
Copyright © 2000, Information Today, Inc. All rights reserved.