On the TREC Trail
By Barbara Quint
Come, my children, and gather round the campfire. Listen
to the old ones tell the tales of the long-forgotten ancient times and of the
great deeds done when the world was young. Long before the coming of Google,
before the howling of Yahoo!, before even the rise (and fall and rise again)
of the dot-coms, an Internet was born. And the midwives at the birthing of
the infant that would one day rise to hold the world in its mighty Web were
ARPA (now known as DARPA, the Defense Advanced Research Projects Agency) and
the National Science Foundation (NSF). Never forget, child clad in the red,
white, and blue, that, whether for good or ill, 'twas the federal government
of the United States of America that first brought the Internet to all the
peoples of the earth.
Even now, when the moon is full and the wind moves the clouds across its
shining face, the ghosts of the ancients meet and ponder their creation and
judge its works. Even now, if you listen silently, you can hear them whisper
wisdom on distances traveled and distances yet unspanned.
In other words, each year the National Institute of Standards and Technology
(NIST; http://www.nist.gov) gathers with DARPA (http://www.darpa.mil) and the
Advanced Research and Development Activity (ARDA; http://www.ic-arda.org),
a friendly, outgoing representative of the "intelligence community." (Can crocodiles
really smile?) Together they sponsor a text retrieval conference known as TREC.
The conference workshops evaluate the efforts of participants to complete difficult
tests designed to advance text retrieval systems in different problem categories.
Working searchers should have a natural affinity for TREC's approach to improving
text retrieval. It tests systems the only way they should be tested. It takes
real questions, even some supplied by virtual reference operations, and real
problems, like detecting the novel or finding relevant information in non-English
documents. It runs the tasks against real datafor example, a million-plus
collection of full-text articles from major news organizations such as The
New York Times and the Xinhua news agency wires. TREC even tests systems
by their ability to admit failurein other words, the answer sought did
not exist in the information set. One year they even required systems to assign
confidence values to the answers detected. TREC tracks also deal with scalability
factors in solutions.
In the question-answering category, questions can extend from "factoids," like
the name of the river called the "Big Muddy," to layered questions, like a
list of chewing gum manufacturers. Other areas of investigation involve tasks
that require judgment, like evaluating which articles take a different stance
on an issue or what specific information new articles contribute to a breaking
story. From the very first conference in 1992, TREC (known as Tipster back
then) focused on answers found rather than documents retrieved. It also called
for systems that would work beyond the English language or answer spoken questions.
Humans assess the success of the automated retrieval processes and hold to
strict standards. Other elements of the search process also receive careful
evaluation. Search systems may even get pop quizzes on questions that have
appeared in previous years' conferences.
Participants in the testing mainly come from universities (well over half,
according to Ellen Voorhees, TREC project manager at NIST), but commercial
operations (such as Microsoft) can also participate. In any case, as anyone
saving pennies to buy into Google's IPO knows, universities very often nurture
the future talent that can take the information industry by storm. A quick
look at the TREC conference questions and the standards of success imposed
would convince any information professional that a winner (or even a placer
or show-er) would be a player to watch in the future.
The Novelty Issue
TREC workshops build around tracks representing specific problem areas. Results
from tracks may differ from year to year. Current tracks cover cross-language
issues; filtering; genomics as a specific domain; HARD (High Accuracy Retrieval
from Documents); interactive or user transaction issues; novelty or how to
locate new, previously unfound information; question answering; robust retrieval,
terabyte, or scaling to larger document collections. The novelty issue also
receives attention from related but non-TREC research conducted by DARPA called
Topic Detection and Tracking. TREC recently added a video track focusing on
content retrieval in digital video, which should expand into a general multimedia
track. TREC also has a Web track that works with a snapshot of the Web as a
document set for search engines.
Voorhees discussed the conference's role in advancing text retrieval services.
She pointed to participation in TREC as a grounding for future start-ups coming
out of academic settings. Most of the corpus that TREC uses for its testing
comes from newspapers or news wiresoften contributed at no chargeand
government documents. They have no direct plans for following the scholarly
communication field (for example, collections of "open access" scholarship)
due to the difficulties of locating the talent required to evaluate success
rigorously. However, the new genomics track established last year does tap
into the National Library of Medicine's PubMed collection of text. In this
area, an NSF grant helps fund the judging process.
One scholar of the search field pointed to TREC's developmentsor lack
of developmentover the last 4 or 5 years as evidence that improvements
in search have hit a "glass ceiling." It has become harder and harder to crack
the final steps to complete answer extraction. As computerized retrieval improves,
more and more participants reach the level of previous peak performers, but
none of the performers seem to move beyond the point where almost all are clustered
Voorhees admitted that success in the question-answering category has leveled
off in the last few years in the area of traditional ad hoc tests, i.e., new
queries seeking document-based answers. In this area, top scores have shown
little improvement. Some of the failure she attributes to the diversion of
effort to the research developments needed to meet new tasks and tracks generated
at TREC. However, she also admitted that nobody yet has appeared at TREC with
the brilliant insights that will take us to the next state of the art. Natural
language processing, according to Voorhees, is a hard problem to beat, especially
if you insist it operate as effectively as a Mr. Spock computer dialog. The
challenge involves teaching machines the tremendous world knowledge and basic
understanding built into human intelligence. Nonetheless she is hopeful and,
when such brilliant breakthroughs finally occur, she expects that some of the
first glimmers will appear at TREC.
In areas outside question-answering, Voorhees has seen major progress. For
example, cross-language retrieval has become reliable for major languages,
not something one could have said 10 years ago. Speech retrieval has also made
significant progress. In fact, Voorhees believes it has reached the point where
it can become usable in large-scale services.
Oddly, Web search engines, such as Google or Yahoo! Search, do not participate
in TREC, although Voorhees assures us that they do follow TREC closely. She
believes that the Web search engines often focus on different problems, such
as spidering and large database management issues. However, with the rising
interest in "answer products," as outlined in Microsoft's new anti-Google strategizing
and demanded by the small screens of wireless technology, it would seem that
TREC's approach would have more interest now than ever before. Voorhees said
she would not be surprised if the fact that all TREC research must be published
openly might not affect the policy of nonparticipation. However, the research
arm of Microsoft has participated in TREC in the past.
Speaking of proprietary interests, we asked Voorhees about the role of the
intelligence community, particularly ARDA, the latest TREC sponsor. Both ARDA
and DARPA primarily support TREC through monetary contributions, Voorhees said.
She admitted that the intelligence community was undoubtedly doing its own
research, designed to answer its own very real needs, and that such research
might have a while to wait before it saw the light of day.
Lack of Publicity
In Internet time, 10 years counts as a century, at least. So, from one century
to the next, TREC conferences have sought to find and promote the creation
of text retrieval systems that do what real people want done and not just shove
masses of text at people based on a "there's a pony in there somewhere" assumption.
The conferences have held developers to the grindstone of the state of the
art, not just the acceptable state of the market. They have helped developers
meet and share experiences with other developers in an experts-only, shirtsleeve
working environment. The only defect I can see in their strategy is one seen
all too often by students of government information science and technology
projects: lack of publicity.
Well, that stops now. If you want to look at the conferences, you will find
a complete listing at http://trec.nist.gov/pubs.html. Most of the proceedings
are available for downloading. The 2004 TREC conference will be held this November
at NIST in Gaithersburg, Md., but it is only open to participants. However,
in February 2005, NIST will publish the proceedings on the Web site. And this
reporter/editor, for one, expects to produce copy on the 2004 conference as
soon as humanly possible.
To hell with the bushellet's see the light.
Barbara Quint is editor of Searcher magazine. Her e-mail address is