Online KMWorld CRM Media, LLC Streaming Media Inc Faulkner Speech Technology
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Searcher > March 2004
Back Index Forward

Vol. 12 No. 3 — March 2004
Making the Case for Patent Searchers?
Go Tell the Clients
by Howard S. Homan, Ph.D. Information Research & Analysis Section ExxonMobil Research and Engineering Company

If you, at the request of a client, glean and filter information from a variety of sources, you are called a searcher. If you search in science and engineering, you are called a technical searcher. This article describes the work of technical searchers. It evolved during my attempts to explain my new job to former colleagues, now clients, during my apprenticeship in technical searching. Most of my current job involves patent infringement searching in the area of petroleum products. For clients, this article explains why searching seems to take so long and cost so much. (The short answer is that searchers haven't been replaced by software, yet. For the long answer, keep reading.) Hopefully, this article will enable clients to make good use of their searchers and help fellow searchers describe their unique profession.

Is Searching Really So Difficult to Explain?

Yes. Every searcher tells at least one story about the difficulty of justifying time and cost to clients. When stories are told, listening searchers just smile and nod knowingly, because they have tales of their own about the difficulty of finding unique words to explain what they do.

Here's some words that I have used to explain searching: "I clarify the client's meaning and intent and then I glean and deliver relevant information." OK, the words sound right; but unfortunately, the words describe abilities that clients also have. So, after hearing them, the clients can still think of searching as ordinary, trivial, and tedious. Clients routinely assign tedious tasks to clerks or replace the tasks with automation and software. So why not replace searchers as well? Many clients probably harbor vague feelings that they are paying a lot of money for services which sound a lot like what they do themselves every day.

Usually, technical searchers are scientists and engineers who have learned to search on the job. To describe searching to former colleagues, I often resort to differentiating myself from other more familiar information sources. I am not IT (information technology); I am not a Web page designer; I am not a software or computer expert; I am not an information clerk; I am not a source for document delivery; I do not maintain the library; and I do not catalog or index new information. However, I do use and hunt in all the information sources just mentioned.

One task, at least, makes searchers unique. Unlike their clients, searchers actually grapple specifically with improving techniques for identifying relevant information. New software tools and mental techniques must be learned. It takes practice. It resembles an art form in that technical searchers usually train new searchers via apprenticeship. The skill improves with experience, and only after 1 to 2 years did I feel comfortable amid the chaos. These days, companies are working on replacing searchers with software; but meanwhile, searchers continue to be paid for their enhanced abilities to clarify intent, interpret meaning, and deliver relevance faster — and better — than clients can do it for themselves.

What Do Searchers Do?

As seen by clients, searchers spend most of their day quietly sitting in front of a computer monitor. Come back an hour later and they're still there. What, exactly, is taking them so long?

Searchers are asked for many types of searches. State-of-the-art, business, current awareness, database creation, patent novelty, patent validity, and patent infringement searches are a few. The following lines describe what searchers do for one type of search, the patent infringement search; but the same techniques are applied, at some level, to all searches.

The Infringement Search

The infringement search is usually requested by an attorney who sends a description of a proposed product or process. The searcher reports in-force patents that may relate to the proposed product or process. Then usually the attorney writes an opinion about the risk of infringing the patents reported by the searcher.

During my apprenticeship, I had to learn some patent law in addition to learning search techniques. I had to get used to the concept of infringement. For example, proposed practice A+B+C could infringe on patents that claim A, on patents that claim B, on patents that claim A+C, but not on patents that claim A+B+D. To avoid wasting the attorney's time, I learned not to report the latter patent (A+B+D). no matter how technically interesting it looked. Also, I had to develop instincts about whether two products or practices may be similar under the doctrine-of-equivalents (the concept that different items might be legally the same if they serve the same function), which requires knowledge of the subject technology.

Searchers in this area must know the difference between freedom-to-practice and right-to-exclude. They must understand the concepts of patent families, divisionals, continuations-in-part, patent expiry, priority dates, legal status, patent applications, granted patents, designated states, etc. They must even learn to report patents that they know make no technical sense.

Enough about law, let's get back to searching. Searches have two parts: capture and cull. During capture, the searcher downloads citations during an online search; during cull, the searcher decides whether to report or reject the downloaded citations by reading the patents' claims. Patents that survive the cull are reported to the attorney client, often with some analysis to explain patterns identified.

The Capture

Capture is executed by uploading a search strategy to a search engine that operates on patent records in an online database. The search strategy is usually a Boolean combination of words and codes. The search engine finds records that contain the Boolean combination of words and codes. The searcher receives (captures) a list of these records.

Before writing capture strategy, I interview the client about his request. To get the meaning right, I ask many questions. What part of the product or process is new? In which countries will the product or process be practiced? Are there patents known to be similar to the product or process? What are the structures of the chemical compounds? Most importantly, what is not wanted?

Patents can be searched in several types of databases. Some databases simply comprise the full texts of millions of individual patents. Other databases are bibliographic (title, abstract, inventors, etc.) and organized into patent families. The Derwent World Patent Index contains more than 12 million patent families. Fortunately, some databases are enriched with additional searchable words or codes, called indexing, added by the database producers. Searching the indexing improves the capture of relevant records. Examples of indexing include chemical fragment codes, end-use application codes, or words chosen from database-specific thesauri. Database producer thesauri condense the myriad words that inventors use into a smaller hierarchy of index terms.

Searchers need to understand enough about the subject technology to write efficient capture strategy. For instance, they need to know which parts of the product or process are so old that there is minimal danger of infringement; patents claiming older technology will have expired. This knowledge makes searchers more efficient and clients should be willing to educate them. In addition to the technology, searchers must remain current in online search techniques so they can translate the technology into up-to-date online search strategies.

An online search strategy is actually input to a search engine — a request. Searchers must learn the idiosyncrasies of various search engines. Some of the complexity of search strategy can be seen in the example search statement below, which captures patents about lubricants. A typical strategy may have five to 30 statements like this one, each constructed to capture patents that claim specific molecules or processes.

"Free text terms for lubes end-use applications" or -

((aircraft or aviation or brake or
cable or catapult or compressor or -

cylinder or drilling or driveline or electric+ or engine? or -

functional or gear or hydraulic or
industrial or insulating or -

jet or lubricating or mineral or
motor or shock or slideway or way or -

spindle or switch or traction or transformer or transmission or
turbine or -

white or machining or metalworking or synthetic or wire pulling)(W) -

(oil or oils or fluid or fluids or lube or lubes or lubricant or -

lubricants)) or -

"EnCompass index terms for lubes and func fluids" or -


"IPC for lubes" or C10M/IC or -

"Derwent codes for lubes" or -

H07/DC or H07+/MC or (Q416/M3 or Q609/M3) or -

(A340/PI or Q7841/PI or Q7647/PI) or -

(644/AM and (01&/AM or 01-/AM or 012/AM or 010/AM or -

(2707/KS and (011/AM or 013/AM or 014/AM)))) or -

"IFI Terms for lubes" or -

10008/CN or 03732/CN or 03208/CN or 03394/CN or 03207/CN or 03719/CN or -

02731/CN or 03979/CN or 05437/CN or 03138/CN or 02087/CN or 05435/CN or -

02423/CN or 01431/CN or 02273/CN or 01441/CN or 06039/CN or 01359/CN or -

05755/CN or 02088/CN or 05205/CN or 03269/CN or 05228/CN or 07715/CN

The terms in a search statement like the above may come from some or all of the sources listed below. Searchers have to study these sources carefully to capture relevant records:

• Free text (learned from inventors, attorneys, other searchers)

• Patent bibliographic data: priority dates, publication dates, assignees

• CAS registry numbers, lexicon, controlled terms

• STN chemical structure drawing and structure searching

• EnCompass indexing

• IFI indexing

• Derwent indexing:

• Derwent Chemistry Resource

• CPI Registry Compounds

• Manual Codes

• Chemical Codes ("BCE" or "Fragmentation Codes")

• Plasdoc (polymer) Codes

• Polymer Index Codes

• International patent classes

• U.S. patent classes

Most of a searcher's training involves the capture. So, stereotypically, clients associate searchers with the capture part of searching, with Boolean logic, with subscriptions to and software for online databases, with search terms and search engines. But wait! Stop! Capturing is not the whole story. It is not where searchers spend most of their time.

Clients might be surprised to learn that most of the cost of searchers' time goes not to online capturing, but to off-line culling. Culling, described below, is essential because today's capturing methods are noisier than clients may want to believe. For an infringement search, searchers usually report only 10 percent of the captured patents. The rest is culled to save the time of the client.

The Cull

The cull involves reading the captured patent claims to decide whether to report or reject each captured patent. Practicing scientists and engineers are used to looking for the most interesting and informative citations. When reading, their eyes scan for reasons to retain a citation. Less-interesting citations are bypassed. In contrast, when culling, searcher eyes scan for reasons to reject, not reasons to retain, a patent. (Remember, the reasons to retain a patent were already embodied in the capture strategy discussed above.) This is a very important point. It means that clients should tell searchers what they do not want (for the cull) in addition to what they do want (for the capture). To be thorough, infringement searchers must not report just the most technically interesting patents; they must reject only the obviously irrelevant patents. So, at first, culling feels counterintuitive. It's negative. Scientists and engineers who become technical searchers must be taught how to cull. Unless the infringement searcher has a good reason to reject a patent, he must report it to the client attorney for his opinion.

Also, the doctrine-of-equivalents complicates culling and requires learned judgement by the searcher. In the case of chemical patents, the searcher must judge how close or how different a claimed chemical structure is to the proposed product being searched when choosing to report or reject.

Patents are rejected for several reasons. They could be too old to be in-force; their claims may require chemical components outside of a proposed formulation; the attorney may not want to see process-only or product-only claims; the claimed uses could be unrelated to proposed practice; or the patent assignee may be the client's own company.

So it is critically important for searchers and clients to agree on the reasons for rejecting, in addition to reasons for capturing, a patent. Optimally, this agreement should be reached before the online strategy is written, when the searcher is trying to clarify the intent of the search request. But, usually, clients have no idea about what they do not want until their searcher presents examples from a preliminary search. If a client tires of my questions about what to reject, I console myself with the thought that I am probably doing a good job. Early in a search, I like to send a list of proposed reasons for rejection to the attorneys and formulators for their concurrence.

Inside-Out Searches and Outside-In Searches

While capture and cull are part of every search, searchers use them in different patterns depending on their knowledge of the subject. Some search patterns are inside-out; others are outside-in.

Inside-out searches are best for searchers learning a new subject area. They learn as they search. For inside-out searches, capture and cull are repeated, using a new search strategy each time, until the chances of finding additional relevant citations is judged to be small. Between repetitions, the searcher writes the new search strategy based on insights from the previous culling. On each repetition, the searcher can write a different search strategy and search different databases. When searching inside-out, searchers can look for the most relevant citations first, then capture the remaining citations in an evolving capture and cull strategy.

One advantage of inside-out searching is that the searchers capture relevant citations early in the search, which they can report to the client to give a sense of progress and to prod the client into teaching the searcher more about what is not wanted. A disadvantage is the extra effort to avoid capturing the same patents on each repetition.

In contrast, outside-in searches are preferred by searchers familiar with the subject searched. For outside-in searches, capture and cull are executed only once. The online search strategy is written to cover all search terms for all databases that need searching. Searchers using the outside-in method are usually more confident that they have included all search terms and all relevant databases. These searchers may feel more confident that they have captured all relevant patents, but the client might not get the most relevant patents first. The client must wait for the searcher to cull through all the captured patents.

So Why Does It Take So Long?

One very large patent infringement search I did for a proposed complex fluid formulation involved 35,000 to 100,000 of the 8million potentially in-force patent families worldwide. That's 35,000 to 100,000 families with some relation to the proposed formulation. I did the search outside-in. The capture identified about 5,000 patent families. The cull left about 200 patent families that I reported to the client attorney.

The search took months! Remember, infringement searches must be thorough. The first 2 months involved the capture: interviews with clients, organizing the formulation into a list of chemical families, verifying the components' chemical structures, discussing example patents, agreeing about what to reject, testing search terms online. The cull consumed most of the next 2-3 months. Culling was done by first reading the 5,000 Derwent-enhanced patent titles to reject the most obviously irrelevant, which left about 500 patents for which patent claims had to be read. It was grueling.

Looking back, the client may wonder why he had to wait months to identify a mere 200 patents. Well, first, I had to condense a very complex list of ingredients into an online strategy based on chemical families, and then I had to cull through 5,000 patents! If the online strategy could have captured only the 200 relevant patent families, a lot of time would have been saved.

The Gap

"There's got to be a better way, right?"­A quote from a client

Why should I have to review 5,000 patents to find the 200 relevant ones? Everyone, including searchers, has an instinctive feeling that searching should be more straightforward. Was my capture strategy inefficient? Now here is the heart of this article: The answer is, "No." Want proof? Try this. After you have completed a search and have your list of relevant patents, try to rewrite your capture strategy to capture only the relevant patents without capturing thousands more. You cannot. This frustrating exercise will show you just how noisy the capture part is. So culling remains necessary because capture methods are just too noisy.

Searchers work within the noise to deliver relevance. Why is there so much noise? I don't know. But it's why searching takes time and money. It's as if there is an incomprehensible gap between a combination of words supposed to represent a product or process in a capture strategy and the intent and meaning of the product or process to the client. The gap exists even for scientific papers, whose authors wish to have their intent clearly understood. Patents are noisier than scientific papers. A well-written patent is skillfully composed of words carefully chosen to disclose little and claim much. Searchers will be needed in this gap until software makes culling unnecessary.

As described above, much of the intent of a search is embodied in the knowledge of what the client does not want. So why don't searchers codify the reasons for rejection directly within the online search strategy during the capture? It should make the capture more relevant and decrease culling. Wouldn't that close the gap? Unfortunately, searchers dare not codify reasons for rejection within the capture. Searchers learn to avoid the Boolean NOT. Consider a simple example. A patent search strategy regarding a proposed zinc-free composition would accidentally exclude patents that claim a zinc-free composition if the searcher uses the Boolean "NOT zinc." Instead, the searcher must find a database whose index terms include "zinc-free," if available, so that the searcher can use the Boolean "OR zinc-free." Otherwise, the searcher must read the claims (cull) to verify that zinc is not claimed.

In a perfect world, culling would be unnecessary because capture would be perfect. The search methods would return only the relevant records. The search software would somehow know what I mean. Searchers want search methods to get better, to help them bridge this gap between the client's words and the client's meaning. Can the reasons for rejection be codified by software? I don't know.

The gap is not the searchers' fault. It's the chaotic boundary between words and meaning. Stated in the words of this paper, no capture strategy can make culling unnecessary — yet. Some clients behave as if they believe that there is no noise, no gap. Unfairly, searchers feel responsible for the noise and struggle to explain why it costs time and money.

Clients do not realize that their own ability to scan for relevant information is miraculous, until they try to program it or try searching for a living. Everyone knows anecdotes about this ability. For instance, everyone has read a paragraph twice because they sense that they have missed something important. Is that sense programmable? Everyone accepts what everyone else means by "I'll know it when I see it." Is that sense programmable? Until someone programs this miracle into search methods, searchers must continue to cull the material captured by online searches. This is why it costs so much and takes so long.

Is There Any Hope?

To minimize the tedium of having to cull so much material (i.e., cut costs), searchers seek improvements for both the capture and the cull. Regarding the capture, information vendors (Questel•Orbit, STN, and Dialog) and database producers (Derwent, API, and IFI) are constantly advertising improvements. New methods of indexing, new index terms, methods of ranking and visualizing, faster document delivery, new Internet search tools, and full-text literature are adding value. The mechanics of writing online search strategy is constantly changing as the search engines evolve. Searchers strive to remain current.

Trying to close the gap between words and meaning, software vendors continually offer new techniques. Busy searchers try to make time to test the new software. Searchers test new software by using it to repeat their previous searches. To start a test, searchers tell the vendor what they searched and give them only some of the relevant material. For adoption, the new technique must find the rest of the relevant material without capturing as much of the culled material. Some clients test searchers by the same method, giving the searchers only some of their examples of relevant patents, and then waiting to see whether the search finds the others.

Today, software that captures the relevant material without capturing any other material is a dream. In the meantime, searchers are paid to bring order amidst a chaos of information faster than their clients can. They are students of information noise; they work in the gap. Their effectiveness stems from these key capabilities:

• Knowing sources of information and the science and technology being searched

• Writing capture search strategy for today's search engines and databases

• Culling rapidly (knowing what you don't want)

• Analyzing patterns in the retained information

What if clients still think searching costs too much and takes too long? Wait. Maybe someone will write the magic software that finds only what you really mean. In the meantime, tell your searchers what you don't want as well as what you do want because they're going to do a lot of culling.

Acknowledgements: I thank all the searchers in my section who suffered through my apprenticeship. When I took this job, I had no idea how much had to be learned...and unlearned.

This article is dedicated, with many thanks, to the boss, Patricia A. Lorenz, on the occasion of her retirement.

       Back to top