Internet Search Engine Update
by Greg R. Notess
Reference Librarian, Montana State University
Search Engine Update
goes up on the Web
soon as it is written, approximately one month before
the print issue mails to subscribers.
The big change in the search engine space this time is
launch of its own search engine. Instead of relying
on Google to provide the bulk of its search results, Yahoo!
now has its own Inktomi-based search engine database.
Many of the other changes and announcements appear to
be a reaction to it.
AltaVista and AlltheWeb,
although now owned by Yahoo!, still
continue to have their own, unique databases, plus separate,
unique search features. Although their image, news,
and media databases have been combined, their main Web
page databases continue to be separate and continue
to be updated. While Yahoo! claims that eventually all
will share the same Yahoo! Database—that was originally
supposed to have happened late in 2003—it has
not yet occurred. The good news is that even when (and
if) these engines do have a common database, the advanced
search features at each are supposed to continue to
Ask Jeeves has announced that it will
be purchasing Interactive Search Holdings, a company
that includes the MyWay.com, My Search, My Web Search,
iWon, and Excite search sites. These sites currently
use either Google results or offer results from several
search engines. Once the purchase is completed, these
sites are likely to be switched over to Ask Jeeves/Teoma
search results. Ask Jeeves has also suspended its Index
Express paid submission service for large (over 1,000
URLs) sites, although it continues to accept paid submission
from smaller sites. It still has no free submission,
although its Teoma search engine (which also provides
the bulk of results for Ask Jeeves) has been quite successful
at finding most sites, even without the free submission.
Gigablast continues to add new features.
The latest additions are the links to archived pages
for each of the URLs in a results list. This is in addition
to the cached archive of that page. The new links are
labeled as "older copies" and link directly
to the Internet Archive's Wayback Machine. Gigablast
has added a related concept section called Giga Bits,
which displays at the top of the search results page.
One more small change—Gigablast now has stop words.
Very common words, such as "a," "is,"
and "the," will not be searched. Gigablast
does display a message noting which terms have been
ignored. Put a "+" in front of terms or include
them within quotation marks as part of a phrase search
to search them.
Google announced an expanded Web database
(from 3.3 billion to 4.285 billion) and an enlarged
image database (doubled to about 880 million images).
Google's last announced increase went to 3.3 billion
a few days after AlltheWeb announced a larger number
than Google's previous number; similarly, the timing
of this Google announcement was on the same day that
Yahoo! announced dropping Google and the launch of its
own database. With that being said, Google's database
growth is still significant. On several actual searches,
it does not seem to find that much more than it did
before the announcement. On a few, Yahoo! finds even
more results than Google. Yet for most searches I tested,
Google still retrieves more results than the others.
In other Google developments, the Google news alerts
have expanded beyond English, to French, German, Italian,
and Spanish. Google Labs has a new experiment providing
access to the Froogle shopping engine via wireless devices
such as mobile phones and PDAs. Beyond its automatic
stemming, Google is also sometimes searching for English
synonyms of query words. Use a "+" in front
of each term to turn off the automatic stemming and
synonym searching. And lastly, the site: search now
works by itself and no longer requires the addition
of another search term.
Lycos is changing again. Its new interface
is designed around the idea of social networking, a
recent hot topic on the Internet. While much of the
content on the older site remains, including search,
the focus is now more on personal publishing with blogs
and home pages, along with searching for people and
groups. At this point, the general Web search remains
and continues to be powered by the same database used
by AlltheWeb. Lycos-owned HotBot continues to be focused
just on search and offers access to the HotBot (Inktomi),
Lycos (AlltheWeb), Ask Jeeves, and Google databases.
Additionally, Lycos launched the HotBot Desktop, a browser-based
search toolbar that enables Web, individual hard drive,
and RSS feed searching.
MSN Search no longer includes results
from LookSmart. In the past, directory listings from
LookSmart came before the Inktomi search engine results.
Now, MSN Search uses only the Inktomi results. It has
also launched its own toolbar. MSN continues to experiment
with various beta versions of its own search engine,
and now that Yahoo! has launched its own version, many
expect to see a new MSN search engine sometime in 2004.
a new search engine database [http://search.yahoo.com]
that no longer uses Google results. Instead, it appears
to have results based in part on an Inktomi-based database,
but the results differ from other Inktomi-based search
engines, including MSN Search and HotBot. One major
useful addition is the cached copy of pages that continue
to be available—this is the first major search
engine beyond Google to include this useful feature.
In a similar way, HTML versions of PDF and other file
types are also available. According to Yahoo!, only
the first 500 KB of a document is indexed, which is
better than Google's 101KB but still short of full-document
indexing available at AlltheWeb. It appears to be able
to handle full Boolean searching using AND, OR, NOT,
and parentheses for nesting. The new Yahoo! search uses
field searches similar to Google's, such as intitle:
and inurl:, along with site:, link:, hostname:, and
url:. The image database continues to pull from Google
at this point. In addition, Yahoo! announced its new
Content Acquisition Program, which is designed to help
both noncommercial and commercial content providers
to get more Web resources into the Yahoo! database.
On the noncommercial side, Yahoo! is working with sites
like the Library of Congress, NPR, Project Gutenberg,
and UCLA's Cuneiform Digital Library Initiative to make
sure their content is included in the database. Inclusion
is not supposed to change relevance ranking, but it
may help move more material from the invisible Web into
R. Notess (firstname.lastname@example.org;
is a reference librarian at Montana State University and
founder of SearchEngineShowdown.com.
Comments? Email the editor at email@example.com.