the Case for Patent Searchers?
Go Tell the Clients
by Howard S. Homan, Ph.D.
Information Research & Analysis Section ExxonMobil Research and Engineering
If you, at the request of a client, glean and filter information from a variety
of sources, you are called a searcher. If you search in science and engineering,
you are called a technical searcher. This article describes the work of technical
searchers. It evolved during my attempts to explain my new job to former colleagues,
now clients, during my apprenticeship in technical searching. Most of my current
job involves patent infringement searching in the area of petroleum products.
For clients, this article explains why searching seems to take so long and
cost so much. (The short answer is that searchers haven't been replaced by
software, yet. For the long answer, keep reading.) Hopefully, this article
will enable clients to make good use of their searchers and help fellow searchers
describe their unique profession.
Is Searching Really So Difficult to Explain?
Yes. Every searcher tells at least one story about the difficulty of justifying
time and cost to clients. When stories are told, listening searchers just smile
and nod knowingly, because they have tales of their own about the difficulty
of finding unique words to explain what they do.
Here's some words that I have used to explain searching: "I clarify the client's
meaning and intent and then I glean and deliver relevant information." OK,
the words sound right; but unfortunately, the words describe abilities that
clients also have. So, after hearing them, the clients can still think of searching
as ordinary, trivial, and tedious. Clients routinely assign tedious tasks to
clerks or replace the tasks with automation and software. So why not replace
searchers as well? Many clients probably harbor vague feelings that they are
paying a lot of money for services which sound a lot like what they do themselves
Usually, technical searchers are scientists and engineers who have learned
to search on the job. To describe searching to former colleagues, I often resort
to differentiating myself from other more familiar information sources. I am not IT
(information technology); I am not a Web page designer; I am not a
software or computer expert; I am not an information clerk; I am not a
source for document delivery; I do not maintain the library; and I do not catalog
or index new information. However, I do use and hunt in all the information
sources just mentioned.
One task, at least, makes searchers unique. Unlike their clients, searchers
actually grapple specifically with improving techniques for identifying relevant
information. New software tools and mental techniques must be learned. It takes
practice. It resembles an art form in that technical searchers usually train
new searchers via apprenticeship. The skill improves with experience, and only
after 1 to 2 years did I feel comfortable amid the chaos. These days, companies
are working on replacing searchers with software; but meanwhile, searchers
continue to be paid for their enhanced abilities to clarify intent, interpret
meaning, and deliver relevance faster and better than clients
can do it for themselves.
What Do Searchers Do?
As seen by clients, searchers spend most of their day quietly sitting in
front of a computer monitor. Come back an hour later and they're still there.
What, exactly, is taking them so long?
Searchers are asked for many types of searches. State-of-the-art, business,
current awareness, database creation, patent novelty, patent validity, and
patent infringement searches are a few. The following lines describe what searchers
do for one type of search, the patent infringement search; but the same techniques
are applied, at some level, to all searches.
The Infringement Search
The infringement search is usually requested by an attorney who sends a description
of a proposed product or process. The searcher reports in-force patents that
may relate to the proposed product or process. Then usually the attorney writes
an opinion about the risk of infringing the patents reported by the searcher.
During my apprenticeship, I had to learn some patent law in addition to learning
search techniques. I had to get used to the concept of infringement. For example,
proposed practice A+B+C could infringe on patents that claim A, on patents
that claim B, on patents that claim A+C, but not on patents that claim A+B+D.
To avoid wasting the attorney's time, I learned not to report the latter patent
(A+B+D). no matter how technically interesting it looked. Also, I had to develop
instincts about whether two products or practices may be similar under the
doctrine-of-equivalents (the concept that different items might be legally
the same if they serve the same function), which requires knowledge of the
Searchers in this area must know the difference between freedom-to-practice
and right-to-exclude. They must understand the concepts of patent families,
divisionals, continuations-in-part, patent expiry, priority dates, legal status,
patent applications, granted patents, designated states, etc. They must even
learn to report patents that they know make no technical sense.
Enough about law, let's get back to searching. Searches have two parts: capture
and cull. During capture, the searcher downloads citations during an online
search; during cull, the searcher decides whether to report or reject the downloaded
citations by reading the patents' claims. Patents that survive the cull are
reported to the attorney client, often with some analysis to explain patterns
Capture is executed by uploading a search strategy to a search engine that
operates on patent records in an online database. The search strategy is usually
a Boolean combination of words and codes. The search engine finds records that
contain the Boolean combination of words and codes. The searcher receives (captures)
a list of these records.
Before writing capture strategy, I interview the client about his request.
To get the meaning right, I ask many questions. What part of the product or
process is new? In which countries will the product or process be practiced?
Are there patents known to be similar to the product or process? What are the
structures of the chemical compounds? Most importantly, what is not wanted?
Patents can be searched in several types of databases. Some databases simply
comprise the full texts of millions of individual patents. Other databases
are bibliographic (title, abstract, inventors, etc.) and organized into patent
families. The Derwent World Patent Index contains more than 12 million patent
families. Fortunately, some databases are enriched with additional searchable
words or codes, called indexing, added by the database producers. Searching
the indexing improves the capture of relevant records. Examples of indexing
include chemical fragment codes, end-use application codes, or words chosen
from database-specific thesauri. Database producer thesauri condense the myriad
words that inventors use into a smaller hierarchy of index terms.
Searchers need to understand enough about the subject technology to write
efficient capture strategy. For instance, they need to know which parts of
the product or process are so old that there is minimal danger of infringement;
patents claiming older technology will have expired. This knowledge makes searchers
more efficient and clients should be willing to educate them. In addition to
the technology, searchers must remain current in online search techniques so
they can translate the technology into up-to-date online search strategies.
An online search strategy is actually input to a search engine a request.
Searchers must learn the idiosyncrasies of various search engines. Some of
the complexity of search strategy can be seen in the example search statement
below, which captures patents about lubricants. A typical strategy may have
five to 30 statements like this one, each constructed to capture patents that
claim specific molecules or processes.
"Free text terms for lubes end-use applications" or
((aircraft or aviation or brake or
cable or catapult or compressor or -
cylinder or drilling or driveline or electric+ or engine? or -
functional or gear or hydraulic or
industrial or insulating or -
jet or lubricating or mineral or
motor or shock or slideway or way or -
spindle or switch or traction or transformer or transmission or
turbine or -
white or machining or metalworking or synthetic or wire pulling)(W) -
(oil or oils or fluid or fluids or lube or lubes or lubricant or -
lubricants)) or -
"EnCompass index terms for lubes and func fluids" or
LUBRICANT/INDUSTRIAL OIL+/IT or FUNCTIONAL FLUID/IT or LUBRICANT STOCK/IT
"IPC for lubes" or C10M/IC or -
"Derwent codes for lubes" or -
H07/DC or H07+/MC or (Q416/M3 or Q609/M3) or -
(A340/PI or Q7841/PI or Q7647/PI) or -
(644/AM and (01&/AM or 01-/AM or 012/AM or
010/AM or -
(2707/KS and (011/AM or 013/AM or 014/AM)))) or -
"IFI Terms for lubes" or -
10008/CN or 03732/CN or 03208/CN or 03394/CN or 03207/CN or 03719/CN or
02731/CN or 03979/CN or 05437/CN or 03138/CN or 02087/CN or 05435/CN or
02423/CN or 01431/CN or 02273/CN or 01441/CN or 06039/CN or 01359/CN or
05755/CN or 02088/CN or 05205/CN or 03269/CN or 05228/CN or 07715/CN
The terms in a search statement like the above may come from some or all
of the sources listed below. Searchers have to study these sources carefully
to capture relevant records:
Free text (learned from inventors, attorneys, other searchers)
Patent bibliographic data: priority dates, publication dates,
CAS registry numbers, lexicon, controlled terms
STN chemical structure drawing and structure searching
Derwent Chemistry Resource
CPI Registry Compounds
Chemical Codes ("BCE" or "Fragmentation Codes")
Plasdoc (polymer) Codes
Polymer Index Codes
International patent classes
U.S. patent classes
Most of a searcher's training involves the capture. So, stereotypically,
clients associate searchers with the capture part of searching, with Boolean
logic, with subscriptions to and software for online databases, with search
terms and search engines. But wait! Stop! Capturing is not the whole story.
It is not where searchers spend most of their time.
Clients might be surprised to learn that most of the cost of searchers' time
goes not to online capturing, but to off-line culling.
Culling, described below, is essential because today's
capturing methods are noisier than clients may want
to believe. For an infringement search, searchers usually
report only 10 percent of the captured patents. The
rest is culled to save the time of the client.
The cull involves reading the captured patent claims to decide whether to
report or reject each captured patent. Practicing scientists and engineers
are used to looking for the most interesting and informative citations. When
reading, their eyes scan for reasons to retain a citation. Less-interesting
citations are bypassed. In contrast, when culling, searcher eyes scan for reasons
to reject, not reasons to retain, a patent. (Remember, the reasons to retain
a patent were already embodied in the capture strategy discussed above.) This
is a very important point. It means that clients should tell searchers what
they do not want (for the cull) in addition to what they do want
(for the capture). To be thorough, infringement searchers must not report just
the most technically interesting patents; they must reject only the obviously
irrelevant patents. So, at first, culling feels counterintuitive. It's negative.
Scientists and engineers who become technical searchers must be taught how
to cull. Unless the infringement searcher has a good reason to reject a patent,
he must report it to the client attorney for his opinion.
Also, the doctrine-of-equivalents complicates culling and requires learned
judgement by the searcher. In the case of chemical patents, the searcher must
judge how close or how different a claimed chemical structure is to the proposed
product being searched when choosing to report or reject.
Patents are rejected for several reasons. They could be too old to be in-force;
their claims may require chemical components outside of a proposed formulation;
the attorney may not want to see process-only or product-only claims; the claimed
uses could be unrelated to proposed practice; or the patent assignee may be
the client's own company.
So it is critically important for searchers and clients to agree on the reasons
for rejecting, in addition to reasons for capturing, a patent. Optimally,
this agreement should be reached before the online strategy is written, when
the searcher is trying to clarify the intent of the search request. But, usually,
clients have no idea about what they do not want until their searcher
presents examples from a preliminary search. If a client tires of my questions
about what to reject, I console myself with the thought that I am probably
doing a good job. Early in a search, I like to send a list of proposed reasons
for rejection to the attorneys and formulators for their concurrence.
Inside-Out Searches and Outside-In Searches
While capture and cull are part of every search, searchers use them in different
patterns depending on their knowledge of the subject. Some search patterns
are inside-out; others are outside-in.
Inside-out searches are best for searchers learning a new subject area. They
learn as they search. For inside-out searches, capture and cull are repeated,
using a new search strategy each time, until the chances of finding additional
relevant citations is judged to be small. Between repetitions, the searcher
writes the new search strategy based on insights from the previous culling.
On each repetition, the searcher can write a different search strategy and
search different databases. When searching inside-out, searchers can look for
the most relevant citations first, then capture the remaining citations in
an evolving capture and cull strategy.
One advantage of inside-out searching is that the searchers capture relevant
citations early in the search, which they can report to the client to give
a sense of progress and to prod the client into teaching the searcher more
about what is not wanted. A disadvantage is the extra effort to avoid
capturing the same patents on each repetition.
In contrast, outside-in searches are preferred by searchers familiar with
the subject searched. For outside-in searches, capture and cull are executed
only once. The online search strategy is written to cover all search terms
for all databases that need searching. Searchers using the outside-in method
are usually more confident that they have included all search terms and all
relevant databases. These searchers may feel more confident that they have
captured all relevant patents, but the client might not get the most relevant
patents first. The client must wait for the searcher to cull through all the
So Why Does It Take So Long?
One very large patent infringement search I did for a proposed complex fluid
formulation involved 35,000 to 100,000 of the 8million potentially in-force
patent families worldwide. That's 35,000 to 100,000 families with some relation
to the proposed formulation. I did the search outside-in. The capture identified
about 5,000 patent families. The cull left about 200 patent families that I
reported to the client attorney.
The search took months! Remember, infringement searches must be thorough.
The first 2 months involved the capture: interviews with clients, organizing
the formulation into a list of chemical families, verifying the components'
chemical structures, discussing example patents, agreeing about what to reject,
testing search terms online. The cull consumed most of the next 2-3 months.
Culling was done by first reading the 5,000 Derwent-enhanced patent titles
to reject the most obviously irrelevant, which left about 500 patents for which
patent claims had to be read. It was grueling.
Looking back, the client may wonder why he had to wait months to identify
a mere 200 patents. Well, first, I had to condense a very complex list of ingredients
into an online strategy based on chemical families, and then I had to cull
through 5,000 patents! If the online strategy could have captured only the
200 relevant patent families, a lot of time would have been saved.
"There's got to be a better way, right?"A quote from a client
Why should I have to review 5,000 patents to find the 200 relevant ones?
Everyone, including searchers, has an instinctive feeling that searching should
be more straightforward. Was my capture strategy inefficient? Now here is the
heart of this article: The answer is, "No." Want proof? Try this. After you
have completed a search and have your list of relevant patents, try to rewrite
your capture strategy to capture only the relevant patents without capturing
thousands more. You cannot. This frustrating exercise will show you just how
noisy the capture part is. So culling remains necessary because capture methods
are just too noisy.
Searchers work within the noise to deliver relevance. Why is there so much
noise? I don't know. But it's why searching takes time and money. It's as if
there is an incomprehensible gap between a combination of words supposed to
represent a product or process in a capture strategy and the intent and meaning
of the product or process to the client. The gap exists even for scientific
papers, whose authors wish to have their intent clearly understood. Patents
are noisier than scientific papers. A well-written patent is skillfully composed
of words carefully chosen to disclose little and claim much. Searchers will
be needed in this gap until software makes culling unnecessary.
As described above, much of the intent of a search is embodied in the knowledge
of what the client does not want. So why don't searchers codify the
reasons for rejection directly within the online search strategy during the
capture? It should make the capture more relevant and decrease culling. Wouldn't
that close the gap? Unfortunately, searchers dare not codify reasons for rejection
within the capture. Searchers learn to avoid the Boolean NOT. Consider a simple
example. A patent search strategy regarding a proposed zinc-free composition
would accidentally exclude patents that claim a zinc-free composition if the
searcher uses the Boolean "NOT zinc." Instead, the searcher must find a database
whose index terms include "zinc-free," if available, so that the searcher can
use the Boolean "OR zinc-free." Otherwise, the searcher must read the claims
(cull) to verify that zinc is not claimed.
In a perfect world, culling would be unnecessary because capture would be
perfect. The search methods would return only the relevant records. The search
software would somehow know what I mean. Searchers want search methods to get
better, to help them bridge this gap between the client's words and the client's
meaning. Can the reasons for rejection be codified by software? I don't know.
The gap is not the searchers' fault. It's the chaotic boundary between words
and meaning. Stated in the words of this paper, no capture strategy can make
culling unnecessary yet. Some clients behave as if they believe that
there is no noise, no gap. Unfairly, searchers feel responsible for the noise
and struggle to explain why it costs time and money.
Clients do not realize that their own ability to scan for relevant information
is miraculous, until they try to program it or try searching for a living.
Everyone knows anecdotes about this ability. For instance, everyone has read
a paragraph twice because they sense that they have missed something important.
Is that sense programmable? Everyone accepts what everyone else means by "I'll
know it when I see it." Is that sense programmable? Until someone programs
this miracle into search methods, searchers must continue to cull the material
captured by online searches. This is why it costs so much and takes so long.
Is There Any Hope?
To minimize the tedium of having to cull so much material (i.e., cut costs),
searchers seek improvements for both the capture and the cull. Regarding the
capture, information vendors (QuestelOrbit, STN, and Dialog) and database
producers (Derwent, API, and IFI) are constantly advertising improvements.
New methods of indexing, new index terms, methods of ranking and visualizing,
faster document delivery, new Internet search tools, and full-text literature
are adding value. The mechanics of writing online search strategy is constantly
changing as the search engines evolve. Searchers strive to remain current.
Trying to close the gap between words and meaning, software vendors continually
offer new techniques. Busy searchers try to make time to test the new software.
Searchers test new software by using it to repeat their previous searches.
To start a test, searchers tell the vendor what they searched and give them
only some of the relevant material. For adoption, the new technique must find
the rest of the relevant material without capturing as much of the culled material.
Some clients test searchers by the same method, giving the searchers only some
of their examples of relevant patents, and then waiting to see whether the
search finds the others.
Today, software that captures the relevant material without capturing any
other material is a dream. In the meantime, searchers are paid to bring order
amidst a chaos of information faster than their clients can. They are students
of information noise; they work in the gap. Their effectiveness stems from
these key capabilities:
Knowing sources of information and the science and technology
Writing capture search strategy for today's search engines
Culling rapidly (knowing what you don't want)
Analyzing patterns in the retained information
What if clients still think searching costs too much
and takes too long? Wait. Maybe someone will write the
magic software that finds only what you really mean.
In the meantime, tell your searchers what you don't
want as well as what you do want because they're
going to do a lot of culling.
Acknowledgements: I thank all the searchers in my section who suffered through
my apprenticeship. When I took this job, I had no idea how much had to be learned...and
This article is dedicated, with many thanks, to the boss, Patricia A. Lorenz,
on the occasion of her retirement.