Click here to learn more about this conference.
Volume 16, No. 8 Nov/Dec. 2002
How-To
Increase Your Web Site's Search Engine Ranking
by Cheryl H. Kirkpatrick

No matter how wonderful and informative your Web site is, it is useless to your patrons if it is hidden. Being accessible online means not just having a site posted but also making it easy to find. One key to getting people to your library site is making sure the search engines have it indexed. This is a great way to reach potential patrons who didn't think to start their searches with the library site! This article discusses how search engines work and what Webmasters can do to help ensure that the search engines can find their Web sites.

Different Ways Search Engines Work

"Search engine" is a generic term that refers to indexes that are used to locate Web sites. There are three types of indexes: human-indexed, spider-indexed, and hybrids of the two. Human-powered indexes are called directories. Probably the best-known directory is Yahoo!. Yahoo! employs a staff of editors who visit and evaluate Web sites, and then organize them into subject-based categories and subcategories.

Most search engines, however, use computer programs that create their listings automatically. The World Wide Web spawned the term "spiders" for these programs that "crawl" (or "spider") the Internet, adding sites that meet their programmed criteria. They are also called "crawlers" or "bots." Sites can be submitted to these search engines, but the pages still have to be indexed by the spiders. In order for these search engines to add and keep your site, it is important to have a Web site that can be accessed by the spiders. A spider harvests words and terms from Web site pages, which are added to a database and indexed. The spider then follows the page's links, both internal and external. Some of the spiders take snapshots of the entire page and cache them in the search engine's database. (You have probably seen these in Google's results, giving the user the option of viewing a cached version of a page.) Some spiders index every word on a page, while others index the title, metadata, headings, and the first paragraph or two on a page. When a user performs a search, the search engine compares the search terms being used with those stored in the database.

Currently, most spiders cannot read or index information contained in images but can capture copies of the images. However, technology is constantly improving, and spiders can now read some information on PDFs.

The third type of search engine is the hybrid, which combines the speed of the spiders and the judgment of humans. A good example of a hybrid search engine is AskJeeves, which uses Teoma Search technology and human editors.

Getting Listed in Yahoo! and Google

Most of us tend to limit ourselves to two or three search engines, but there are many more available. Yahoo!'s directory lists 105 search engines and 25 directories. Here I will discuss two of the most popular indexes: Yahoo! and Google.

As I said earlier, Yahoo! is indexed by editors who visit and evaluate Web sites, and then organize them into subject-based categories and subcategories. Commercial Web sites must pay to be included in the directory, using a feature called Yahoo! Express. Noncommercial Web sites have two options: pay with Yahoo! Express or suggest their site for the noncommercial categories. Sites using Yahoo! Express will be evaluated within 7 business days and are not guaranteed inclusion. Yahoo! Express has a nonrefundable, recurring annual fee of $299. If you suggest your site for the noncommercial categories, there is no guarantee that the site will be reviewed. You may resubmit your site once every 2 or 3 weeks until it is included.

Google is a fully automated search engine. Sites are commonly included in this index when found by spiders that jump from link to link on the Web. The more sites that link to your site, the better your chances are of being included in Google's index. You may submit your URL as well, but submission is not necessary and does not guarantee inclusion in the index. There is a way to sneak into Google. Yahoo! and Netscape's Open Directory Project indexes are included in Google, so inclusion in one of these adds your site to Google.

The FTC's Recent Recommendation

Each search engine has its own algorithm to determine relevance within its results. In July of this year, in response to a complaint by consumer watchdog group Commercial Alert, the U.S. Federal Trade Commission (FTC) issued a landmark recommendation to search engines about their practices concerning paid inclusion and paid placement. Paid inclusion is the practice of paying to ensure that a Web site is included in a search engine's index, while paid placement ensures that the Web site will place within the top rankings of a search using prearranged search terms. It is important to note the differences between the two. Paid placement is more of an issue than paid inclusion. Most major search engines contain sites that pay to be included. Indeed, Yahoo! requires that all commercial sites pay a fee. The primary question about paid inclusion is whether sites that pay are offered preferential treatment, such as more frequent indexing, over those sites that do not pay.

If you would like to read more about the FTC's recommendation, Search Engine Watch has a long article at http://www.searchenginewatch.com/sereport/02/07-ftc.html.

Obstacles to Search Engine Spiders

There are several obstacles that hinder spiders:

Flash: As I said earlier, search engine spiders read text. They cannot read information contained in images. Flash movies are a series of images. If a page consists of only a Flash movie, there are no words or terms visible to the spider. In an attempt to overcome this problem, Flash can include a copy of the text in a comment field, which often results in the published file containing one instance of the text for every frame in the movie. If a movie is primarily animated text, as many are, the Flash movie can have a very large comment field that contains numerous copies of the text in the movie. Instead of solving the no-text problem, this creates a new obstacle. This can cause a search engine to label the site a "spammer" and drop it from the index. Spammers hide numerous copies of text from the user's sight, either in a comment field or by making the text the same color as the background, in an attempt to fool the spiders. Search engines discovered this practice and now consistently drop sites where it is used. So be sure that your comment field contains only one copy of your text.

There are other workarounds for Flash movies. The simplest is to embed the movie on a page that has HTML text that the spiders can index. Also, Macromedia has produced another product that can be used on servers: Macromedia Flash Search Engine SDK. It provides a set of object and source codes designed to convert a Flash file's text and links into HTML for indexing. I suspect that the new accessibility features built into Flash MX, the latest version of Flash, will help the problem that Flash movies have with search engine spiders. Flash MX is the first version of Flash that allows text equivalents to be specified for elements of its movies. This should make text available to the spiders.

Frames: A page made with frames allows you to have two or more Web pages open in the same window at the same time. These pages are loaded into a parent page that contains links to the frames. The spiders see the parent page as devoid of text, and therefore, do not index it. The spiders may or may not follow the links to the child pages to index the information they contain. The workaround for this is to have a no-frames version of your page that contains the same information as the frames version.

Dynamic Pages: When spiders come upon a URL that contains a question mark, a sure sign of a database-driven page, they simply stop indexing. There are workarounds for this problem, but these are too complicated to cover in this article. You can learn more about dynamic pages and search engines at the Web Developer's Journal: http://www.Webdevelopersjournal.com/articles/spider_dynamic_site.html.

Building a Spider-Accessible Site

There are steps that your Web designers can take to improve the chances of their sites being indexed by search engine spiders.

1. The single most important factor in being spider-friendly is to have plenty of HTML text. That is not as silly as it sounds. Some people like to make images with text for their Web pages in order to have unique fonts or special effects such as drop shadows. Remember, spiders read and gather text, not images. Web pages that have all their text presented on images have nothing for the spiders to index. However, spiders do index the terms used in alternate text, or alternate tags—so be sure to use them.

2. Use title tags that accurately describe your site.

3. Use the meta description tag to accurately describe your Web site. Many search engines index this field and will display your description within the search results. However, do not be concerned with the meta keywords tag. According to Danny Sullivan, editor of Search Engine Watch, only one major search engine, Inktomi, still supports this tag. A few years ago, we thought that metatags would catalog Web sites and be the perfect indexing tool for search engines. However, unscrupulous Web designers spoiled metatags' wonderful potential by using deceptive keywords that rendered the metadata incorrect.

4. Ask other sites to link to your site and/or swap links with other sites. Google, one of the most popular search engines, ranks relevance by the number of sites linking to a particular site. Libraries can ask that links to the library site be added to city, county, state, and civil organization Web sites.

5. Have a site index or site map. The search engine spiders will follow the links provided to all the pages listed, so your entire site will be indexed for the search engine's databases.

6. You can submit your site to individual search engines yourself or pay search engine submission services. These companies submit the site to many search engines and will resubmit it on a regular basis to keep it active. Typical fees range from $99 to $300 per year.

Staying current with technological changes is a never-ending challenge for librarians. A couple of sites that can help you stay informed about this topic are Search Engine Watch at http://www.searchenginewatch.com and Search Engine Strategies at http://www.searchenginestrategies.biz.

These simple steps can greatly increase the exposure that your library's Web site gets. They ensure that potential patrons who didn't start their search at your site get a second chance to use the library's valuable resources. With just a few small changes you can reap huge rewards by making your site spider-friendly.


Cheryl H. Kirkpatrick is the Web administrator and information technology librarian at the South Carolina State Library in Columbia. She holds an M.L.I.S. from the University of South Carolina. Her e-mail address is cheryl@leo.scsl.state.sc.us.

Table of Contents Marketing Library Services Home Page