Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > December 2003
Back Index Forward

Information Today
Vol. 20 No. 11 — December 2003
Letters to the Editor

Another View of De-Duplication

In the opinion of MuseGlobal, "The Truth About Federated Searching" in your October 2003 issue contains a number of statements that are erroneous. In the interest of presenting your readership a more balanced view of federated search technology, I'd like to correct some of the misimpressions left by the article.

1. De-duplication does work.

Webfeat asserts that de-duplication doesn't work. Their argument is that because a single search returns a very large number of hits, say 100,000, you can't claim to de-dupe unless you de-dupe every one of those hits. This is like saying that a search doesn't work unless you view all 100,000 of those hits.

Sure, it would take a long time to process all of those records. In federated searching as in everything else in life, there are trade-offs. Most searchers initially retrieve a limited set of records from each source. This allows the searcher to check their general usefulness (and possibly decide to re-run the search on fewer, more relevant sources). Not only does this save time, but the searcher is not needlessly clogging up servers by delivering thousands of records that will almost immediately be discarded. In this way, the de-duplication performance issue of dealing with impossibly large numbers has been resolved. If the first set of results is found wanting, the waiting mass of results are still there and can be tackled in manageable bites. With the right technology, the next "bite" of results can be processed like the first, and the new results can be quickly de-duped against those left over from the first set of results.

It seems reasonable that you should be able to recognize that a new record being added to the set is the same as an existing one. Of course, you have to be merging the results from multiple sources and processing them all in an integrated results set to do this (like our product, MuseSearch, and several other products do), not maintaining them in separate groups by source (like WebFeat does) in order to perform de-duplication. We've been de-duping since day one, but we would be the first to say that de-duping isn't perfect by any means. In fact, we often state publicly that metasearching is an 80/20 solution—you're better off with metasearching than without it, and it will only improve over time.

De-duping is one of the many differentiators among federated search products; in fact, the ability to de-duplicate results is one of the key requirements articulated by users. Don't take our word for it—see the detailed study sponsored by the National Library of New Zealand that concludes "the consensus about the role of a common user interface is that it should be able to broadcast a single search to a variety of databases in different locations and in different formats and to unify the results from these databases, then present them in a useful order and de-duplicate the results (emphasis added). This is just one of the reasons the study awarded MuseSearch top ratings. The full study can be downloaded at

2. Federated search can be software or a service.

The WebFeat article asserts that federated searching is best when offered as a service, and that this is the only approach that avoids downtime for software or source connector updates. The truth is, a centralized service is not necessary in order to incorporate frequent software updates without downtime.

Our Source Factory distributes software and source connector package updates seamlessly, allowing extremely high levels of service with very little local administration effort. Updates can be made automatically, without service disruption. Most of our technology partners (COMPanion, Endeavor, Innovative Interfaces, Mandarin, Sirsi, etc.) offer both local software implementation and hosted service options. Most customers opt for a local software implementation. Our experience has been that local customization and security requirements are best served in this way. The bottom line is, the best option is flexibility to implement in the way that is most effective for each user.

3. You do get better results with a federated search engine.

The aim of MuseGlobal is to provide better results with less effort. In general, you can get better results with federated search than by using native database search because, practically speaking, few searchers would have the time or patience to do these searches repetitively via individual search interfaces. In the real world, federated searching can exponentially improve the efficiency and quality of results.

We invite your readers to try Muse federated search technologies for themselves with MuseSeek, our new consumer-oriented Web metasearch engine (

Cheryl Wright
Vice President, Marketing
MuseGlobal, Inc.

Where in the World?

I have been a subscriber to Information Today for some time now, and generally look forward to Barbara Quint's articles, which are usually very pithy and informative.

I must tell you however, that I was a bit bothered by something she wrote in her October 2003 [Up Front] article, in which she referenced Earth Station 5 as being "reportedly based in Palestine."

Perhaps I shouldn't assume that she is aware that there currently is no nation or state in the world by that name, which was last used by the British during the Mandate period?

A quick Internet search revealed that Earth Station 5 is located in Jenin, one of the autonomous areas under the control of Palestinian Authority, which would have been a more accurate way to describe the location.

As a sophisticated journalist, I'm sure she's aware of the significance of names and of the importance of accuracy. And to give her the benefit of the doubt, I will assume the reference was an oversight and not intentional, for to inject one's own politics into a professional journal article is most unfortunate, and unprofessional, as I'm sure you'd agree.

Thank you for your time and consideration.

—Glenn Ferdman
Director, Asher Library
Spertus Institute of Jewish Studies


       Back to top