The Truth About Federated Searching

Online

KMWorld

CRM Media

Streaming Media

Faulkner

Speech Technology

Unisphere/DBTA

Search all of Information Today's sites!

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Magazines > Information Today > October 2003
Back Index Forward

SUBSCRIBE NOW!

Vol. 20 No. 10 — Nov./Dec. 2003

FEATURE
The Truth About Federated Searching
Source: WebFeat (http://www.webfeat.org)

Federated searching is a hot topic that seems to be gaining traction in libraries everywhere. As with many technologies that are rapidly adopted,there are some misconceptions about what it can do. WebFeat, a provider of federated search technology to more than 900 public, academic, and corporate libraries, including more than half of the top 10 U.S. public libraries, has compiled this list of the five most commonly repeated misconceptions about federated searching.

1. Federated search engines leave no stone unturned.

Reality: Not all federated search engines can search all databases, although most can search Z39.50 and free databases. But many vendors that claim to offer federated search engines cannot currently search all licensed databases for both walk-up and remote users. Why? Authentication. It's very difficult to manage authentication for subscription databases, particularly for remote users. Before buying, ask vendors to demonstrate that they can search all of your library's databases using your library's own authentication, both locally and remotely.

2. De-dupe really works.

Reality: For federated search engines, true de-duplication is virtually impossible. In order to de-dupe, the engine would have to download all search results and compare them. The limiting factor is not federated search engine technology, but the way databases return results: 10 or 20 records at a time. Completing a true de-dupe operation would take hours because a single search might produce 100,000 hits. These hits or citations typically come back 10 to 20 at a time. If it takes 5 seconds to download 20 hits, it would take hours to download them all. And the same citation may appear in different places in results sets from different databases. So to completely de-dupe search results, it's necessary to download all results from all databases. Vendors that claim to do true de-duping usually are just de-duping the first results set returned by the search.

3. Relevancy rankings are totally relevant.

Reality: It's impossible to perform a relevancy ranking that's totally relevant. A relevancy ranking basically counts the occurrence of words being searched in a citation. Based on this frequency of occurrence, items will be moved closer to the top or farther down the results list. Here's the problem: When attempting to relevancy-rank citations, the only words you have to work with are those that appear in the citation. Often, the search word doesn't even appear. The abstract and full-text data, as well as the indexing that content providers use to relevancy-rank their content, are unavailable to federated search engines. The content providers have the full article and indexing to work with, but not the federated search engines. They have only the citation to search on.

4. Federated searching is software.

Reality: It certainly is software, but it's best consumed as a service. A federated search engine searches databases that update and change an average of 2 to 3 times per year. This means that a system accessing 100 databases is subject to between 200 and 300 updates per year—almost one per day! Subscribing to a federated searching service instead of installing software eliminates the need for libraries to update translators almost daily so they can avoid disruptions in service. (Translators convert search queries into something that can be understood by the database that's being searched.) Without frequent updates to these translators, entire databases can become periodically unavailable for searching. It's unacceptable for a database subscription that couldcost a library $10,000 or more per year to be offline for any amount of time.

5. We don't make your search engine. We make your search engine better.

Reality: You can't get better results with a federated search engine than you can with the native database search. The same content is being searched, and a federated engine does not enhance the native database's search interface. All federated search does is translate a search into something the native database's engine can understand. But it's restricted to the capabilities of the native database's search function. A federated search can't do a three-term search with Boolean operators in a native database whose interface doesn't support it. Federated searching cannot improve on the native databases' search capabilities. It can only use them.

Paula J. Hane is Information Today, Inc.'s news bureau chief and editor of NewsBreaks. Her e-mail address is phane@infotoday.com.

Back to top