photo
Greg Notess
Reference Librarian
Montana State University
[ONLINE]
on the net

Tracking Title Search Capabilities

ONLINE, May 2001
Copyright © 2001 Information Today, Inc.

Subscribe

Web page title search capabilities have expanded somewhat, although to nowhere near the levels available in our more sophisticated bibliographic systems.
Since the dawn of online bibliographic searching, the title search has been an important, but often confusing, component. In the bibliographic realm of published literature, the title can often uniquely identify a publication and provide one of the easiest searchable access points for the publication.

Online library catalogs and periodical indexes now have a wide variety of title search capabilities. Titles can be browsed in an alphabetical list. Keyword searches for single words within a title are available along with exact title and title phrase searches. And bibliographic databases can even have multiple title fields for main title, subtitle, series title, alternate title, uniform title, and more.

On Web pages, there is only one title, designated by the HTML title element. Title searching on the Web goes back to the earliest days of the Web search engines. As the early HTML documents began using the title element, it became an important element for early search engine ranking algorithms. AltaVista even made it searchable. Since that time, Web page title search capabilities have expanded somewhat, although to nowhere near the levels available in our more sophisticated bibliographic systems.

BIBLIOGRAPHIC TITLE SEARCH ISSUES

An ideal system would recognize when a title phrase match works better than a title word match.
Title searching is not as straightforward as simply typing in the title and getting the exact match. Several approaches to title searching are possible and different search tools use different approaches. Take a document with the title of "Furry Snake Amazes Lunar Tourists." To find this document, a searcher might enter any of the following:

furry snake amazes lunar tourists

furry snake

furry snake lunar

snake amazes tourists

In the first example, a title phrase search where the record that is an exact match for those words in exactly that order would be the right match. For the second, any title that matches the phrase at the beginning of the title might be the correct match. For the last two, a title word search works best, where the hits all contain those words in the title somewhere, but not necessarily in that order.

An ideal system would recognize when a title phrase match works better than a title word match. However, ideal systems do not yet exist. For those search systems that provide both, the searcher needs to choose which to use. One-word titles are problematic, since a title word search for Science finds every title containing "science." Library systems may suggest using a title phrase search for single word titles. While that works, it is counterintuitive to search a single word as a "title phrase."

WEB PAGE TITLE SEARCH ISSUES

On the Web, the situation is quite different. Title searching is not always available nor is it always divided up into title word and title phrase searching. After all, titles do not uniquely identify individual pages. Nor is there any consistency. Many a Web page has one title within the HTML title element with another apparent title in the top banner graphic, first header, or prominent in the body of the page. Sometimes these titles are quite different while other pages neglect to even use the title element. A Web page title may look like "Best Travel Agent, airlines, flights, budget travel, discount tickets." Rather than representing the content of the page, the title is loaded up with keywords to raise the page's relevance ranking in search engine results.

Even so, for Web pages that do use titles, and use them appropriately, the title is often a good summary of what the page is about. With the lack of any authoritative and consistently applied subject indexing of Web pages, the titles provide one of the best subject access points for searchers. A page entitled "Market Share in the Light Bulb Industry" is likely to have information on market share in that industry whereas a page that only contains those words in the text may simply be a collection of jokes. Title searching can help get better retrieval for searches for organizations, biographies, lesson plans, FAQ pages, and other subject-oriented needs.

LIBRARY DATABASE SYNTAX

While the standards community found agreement in the CCL and a few online vendors adopted (or adapted) it, there is still a wide range of ways to search titles.
Since we now use the Internet to connect to so many library catalogs, online periodical indexes, and other bibliographic databases, comparing title searching in library databases with title searching on the Web search engines makes for an interesting counterpoint.

Not so long ago, the Common Command Language (CCL), now an official standard, was the rallying cry of the information industry user. Why not have Dialog, SilverPlatter, OCLC, library catalog vendors, and all the others use a common syntax for special searches such as the title search? Now in the bibliographic realm, two main kinds of title searches were considered. Title keyword searches looked for the search terms appearing anywhere in the title. A title phrase search would look for an exact match of the whole title.

While the standards community found agreement in the CCL and a few online vendors adopted (or adapted) it, there is still a wide range of ways to search titles. We can use /ti on Classic Dialog and STN. SilverPlatter likes the in ti while EBSCO Host and InfoTrac use just ti followed by the search terms. While ProQuest offers TI(), FirstSearch uses ti: for title keyword searching and ti= for a title phrase search. The one common element is the "ti" abbreviation for title, which all of the Web search engines have avoided in favor of the full word "title."

SEARCH ENGINE SYNTAX

Most search engines that support title searching have the option as a separate box or as a choice in a drop- down menu. Northern Light's Power Search is a typical example. The second box down is for "words in title." AlltheWeb.com's advanced search shows an alternate method. Under Word Filters, each box can search one of several fields, including an option for "in the title."

For the advanced searcher, AltaVista's original approach of field name, colon, query term is available from several search engines. On AltaVista, a search for "market" in the title can be entered as title: market. Note the use of "title" rather than "ti." Multiple word title field searches are where the bibliographic database separation of title keyword from title phrase searches meets the Web. Which search engines can handle title word searching and which do title phrase searching? Can a space be used after the colon?

First of all, which even support title searching? AltaVista, AlltheWeb, Northern Light, Lycos, some Inktomi partners, and Google all have some capability for title searching. All but a few of the Inktomi partners have title searching available via a form with a drop-down list where a title search is one option. In addition, all but Google support the title: command line syntax. Google chose a different path.

GOOGLE'S SPECIAL TITLE SYNTAX

The most recent search engine to add the title searching capability is Google. However, they did not choose to follow what all the others had begun to make an informal standard. On Google's Advanced Search page they added an option to "Return results where my terms occur" followed by a drop-down box with one option as "in the title of the page."

Using that form option, Google does not give the searcher a chance to combine a title search with terms that would be elsewhere on the page. The command line version of the scripted ability on the Advanced Search page is the allintitle: field search, which requires all the terms to be in the title but in no particular order. However, allintitle: cannot be used in combination with other search terms.

The other title search option on Google is not available on the Advanced Search page but only on their regular search. This is the intitle: field search that can be used in combination with other terms that need not be in the title. However, intitle: can be used for only one word at a time.

So Google has added some title searching, using two different commands, but they each have their limitations. And both function a bit differ- ent from the more usual title: command that other search engines use.

MULTIPLE WORDS AND PHRASES

While Google's intitle: only handles one word, what happens for multiple words entered after the colon on the other search engines? A search for

title:green tomato pie

will usually search only for "green" in the title. AlltheWeb, Lycos, the Inktomi partners, and Google's intitle: will process that search as

(title:green) AND tomato AND pie

while AltaVista processes

(title:green) OR tomato OR pie

even though AltaVista's relevance ranking brings records whose titles contain all three terms to the top. Only Northern Light searches for records where all three terms appear in the title.

All the search engines discussed here support phrase searching using double quotes. If a searcher wants to try the above search as an exact phrase search, the typical syntax is

title:"green tomato pie"

AltaVista, AlltheWeb, Lycos, Northern Light, and Google (using allintitle:) support that syntax and retrieve only records that contain that exact phrase within their title.

Unfortunately, most of the Inktomi partners (such as HotBot, MSN Search, and iWon) choke on that syntax and end up processing it as if "title" were a search term:

title AND "green tomato pie"

HotBot could do a phrase search in the title if, instead of the title: syntax, the phrase is entered in double quotes in the search box and the drop-down "Look for" menu choice is changed to "the page title." However, using that same strategy on MSN Advanced Search fails to work properly. So check carefully when trying to search a title phrase with an Inktomi partner.

NO SPACE FOR SPACES

For search engines that support the title: syntax, the search query should come right after the colon with no space. For example, a search for the word "green" in the title would be

title:green

If there is a space after the colon, as in title: green, then AlltheWeb, Lycos, Google (using intitle:), and the Inktomi partners will not process it properly. Instead, they will consider "title" to be a search term. AltaVista, Northern Light, and Google's allintitle: work just as well with the space after the colon. Strangely enough, AlltheWeb and Lycos can handle a space after the colon if a phrase is searched. But since they will all work without the space, just leave the space out.

TITLE SEARCH FORMS

What about using the search forms rather than the command line? The AltaVista Power Search form and Google's Advanced Search both give a choice for how to search for the words in the title. AltaVista offers Any of the Words, All of the Words, Exact Phrase, or Boolean. Google has With All of the Words, With Any of the Words, With the Exact Phrase, and Without the Words. On the AlltheWeb Advanced Search form, any terms entered into a title search box will automatically be searched as a phrase.

In general, using the form or drop-down menu options can be effective ways to search titles, especially for simple title searches. However, combining a title search with additional search terms can only be done via the forms at AlltheWeb, Lycos, Northern Light, and HotBot.

While title searches can be very helpful for the advanced searcher, the current implementation at each search engine is rather confusing. Learn the details of one or two, but be wary as their capabilities may change at any time. For a quick overview, check my title search page (http://www.searchengineshowdown.com/features/title), which I will try to keep up-to-date as the title search features change. And then we can all try to figure out how to teach one syntax for use in our bibliographic databases and another for use on the Web.


Greg R. Notess (greg@notess.com; http://www.notess.com/) is a Reference Librarian at Montana State University.

Comments? Email letters to the Editor at marydee@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2001, Information Today, Inc. All rights reserved.
custserv@infotoday.com