|The Millennium Issue||Volume 8, Number 1 • January 2000|
My job for this
Millennium issue of Searcher is to predict what scientific and patent
information will look like in the next millennium, or as far into it as
I care to venture. But before I try to look ahead to the next millennium,
I’d like to look back into the last one and beyond, to the history of science
and technology, as well as the history of scientific and technical information.
This will give you some perspective when I start making predictions.
The Ancients observed nature, observed the night sky, and developed a “protoscience” that combined astronomy with theology and mathematics (as in the case of Stonehenge). Religious considerations played a greater or lesser role depending on the culture — greater in Egypt and India, lesser in China and Mesopotamia. These cultures observed and described nature with some precision, but their explanations for phenomena were mostly mystical — the balance of yin and yang, the harmony of the elementals (fire, water, earth, air), and so on. Understanding nature involved religion and magic more than reason and experimentation.
experimentation, was also the essence of science in ancient Greece. The
Greek natural philosophers developed sophisticated, if not necessarily
accurate, mathematics and astronomy; and Hippocrates and Galen developed
the concept that disease was a natural phenomenon, not just the wrath of
the gods. But their work did not extend to understanding the causes, much
less finding cures.
Where was science a millennium ago? Much of the culture and science of the ancients was lost; much was unavailable to Western Europe; and what was known was known to only a few. Medicine, biology, chemistry as we know them today, the basic concepts of scientific experimentation, didn’t exist. Natural philosophy was still rooted in religious explanations.
Half a millennium ago the Scientific Revolution began with Copernicus’ concept that the sun, not the earth, lay at the center of our local cosmos. Galileo invented the telescope, which gave a physical reality to the moon and the planets. A century later Newton developed the laws of motion and gravity. Mathematics developed dramatically in the 17th century, giving scientists the tools to quantify and describe phenomena. Chemistry started advancing beyond alchemy in the 18th century, with Lavoisier’s discovery of oxygen and his refutation of the old phlogiston theory. In the late 18th century Coulomb began to measure electromagnetic forces.
In the 19th century, biology developed past natural philosophy. Darwin came up with the concept of natural selection to explain evolution, the gradual changing of species — a phenomenon that Lamarck had observed long before but understood only as a drive of organisms to “perfect” themselves. Medicine advanced in the 19th century when Pasteur discovered bacteria as the cause of many diseases.
By the end of the
19th century, mathematics had explained heat, light, and electromagnetics
with classical thermodynamics, wave theory, and Faraday’s and Maxwell’s
laws. Then, in the 20th century, Einstein threw a monkey wrench into the
works with his theory of special relativity, which made the observation
of events dependent on the observer as well as the event. As the 20th century
ends, physicists are still working on a unified field theory.
How about technology? Rudimentary technology existed a millennium ago and, indeed, in ancient cultures. People used the basic machines that we all learn about in first-year physics — levers, pulleys, inclined planes, wedges, and so on — to create buildings, transportation vehicles, and other machines and devices; but the energy provided basically came from human and animal strength. Land transportation was on foot, on horseback, or in animal-drawn vehicles. Ships used wind power or human muscle. Wind and water power were not captured in windmills and water wheels until several centuries into this millennium. Admittedly medieval architects made wonderful use of the tools they had, creating among other things the great Gothic cathedrals. But they did it with the sweat of their — and their work animals’ — brows.
Revolution didn’t come until the 19th century; so, the first sorts of mechanized
transportation — steam-engine-powered trains and ships — started only in
the last one-fifth of this millennium. Leonardo daVinci dreamed of and
designed fanciful airships, but he never realized his dreams; and human
beings didn’t get off the surface of the planet (except in balloons) until
the last one-tenth of this millennium. The modern computer came into existence
in the last one-twentieth of this millennium — just about 50 years ago.
Where were education and learning a millennium ago? Almost everyone was illiterate. Books were hand-lettered and so rare that only nobles and the very wealthy had even one or two; most were created and kept in monasteries. The monks helped preserve knowledge and shared it with each other, but their resources were not available to the public. In the year 1000, Christian Europe had two universities and no public libraries. The millennium was almost half over before the printing press began to make books more widely available.
For over half the
millennium, natural philosophers, alchemists, and other scientists tended
to keep their discoveries to themselves, or at best recorded them cryptically.
Not until the 17th century did the concept come into existence of spreading
scientific information widely and quickly, of sharing observational and
experimental methods. Scientists started trying to reproduce each others’
results, as part of the development of modern scientific methods. The 17th
century saw the start of scientific societies, the Royal Society of London
and the Academie des Sciences of Paris. Scientific and technical schools,
which encouraged spread of scientific knowledge, came into existence only
at the end of the 17th century — the last one-fifth of this millennium.
How has scientific information developed over the millennium? Libraries have existed at least since ancient Egypt (in Alexandria), but they were repositories of information, searchable only physically in the library. Scholarly journals came into existence in the mid-17th century as part of the process of spreading scientific information. Individual books and periodicals had subject indexes as early as the 16th century (for books), but the transition to indexes that tried to cover multiple publications, indeed whole subject areas, didn’t start until the 19th century with such tools as Chemisches Zentralblatt. Chemical Abstracts and most of the other scientific indexing and abstracting services are products of this century.
Early tools for subject access involved users digging through printed subject indexes to try to find the information they wanted. When they wanted information on topics with multiple subconcepts, they would have to look under index terms for each of the subconcepts. Post-coordinate indexing, which would permit users to combine several indexing terms or concepts and retrieve references indexed with all of them, started around the second half of this century, with uniterm indexes. Searchers would compare lists of document numbers printed on cards, each card representing an individual indexing term, to see which documents showed up as indexed with all the terms of interest. Other early manual and mechanized resources for post-coordinating indexing terms included needle- or machine-sorted punch cards (each card has all the terms indexed for one reference) and peephole cards (each card has all the references indexed with one term).
Computers used for information retrieval came into existence not too long after modern computers themselves. Early searching tended to be primitive Boolean, using only the indexing terms or free text, but developments came rapidly in the pharmaceutical and chemical areas — both industries willing to pay for good information. Derwent introduced chemical coding for pharmaceuticals in 1963, and by now many databases permit searching of chemical coding, chemical structures and substructures, and polymer components. At their best, these databases reach toward searching of concepts, not just terms — as long as the searcher knows how to use them.
At this point in time, these systems are still relatively user-hostile. Searchers usually need both a knowledge of the subjects covered and extensive training in the policies, practices, and peculiarities of the indexing systems. Also, they are only as good as their indexers can make them. How well the systems work depends on how easily and consistently concepts can be translated to search parameters and how well the databases’ indexing expresses the concepts a searcher wants to find. They are not yet intuitive.
retrieval went online in the 1970s, and by now powerful search engines
exist that permit searching and to some extent merge the indexing of multiple
databases. These are the lifeblood of information professionals. At the
same time, the Internet has proliferated low-cost information resources
that everyone can use. Integration of Internet and online capabilities
is just starting as we approach the next millennium.
How does one predict the future of scientific, technical, and patent information in the next millennium? I’ll tell you something: Nostradamus was a fake. Nobody can really predict where science is going, and for a very good reason: No one today is capable of conceiving what the science of even 100 years into the future — a mere one-tenth of a millennium — can be. There’s a name for predictions of technological developments: science fiction. And when you read the science fiction written 50 or even 25 years ago, you realize how unreliable its predictions were. On the one hand, science-fiction writers expected us to have colonized the moon, landed on Mars, and explored the outer planets by now. On the other hand, the early writers hadn’t a clue of what computer technology has grown — or rather, shrunk — into with miniaturization. (Even HAL was a big clunker.)
It is also a truism that to a primitive society, any sufficiently advanced technology is indistinguishable from magic. So if people from a few hundred years into the next millennium were to show up today with examples of their technology, none of us could understand it, much less predict it. (And if you don’t believe that, imagine trying to explain computer chips — or the engineering of the SST, or the physics of the H-bomb, or the mere concept of the Internet — to someone from the court of Louis XVI.)
How can even the most brilliant of us conceive truly fundamental changes in science and technology? Examples abound showing that we can’t. Before Einstein’s special relativity, physicists of the 19th century thought there was nothing left to theorize; it was the task of future scientists just to measure things to the next decimal place. An infamous Commissioner of Patents, Henry Ellsworth, said (in his 1843 Annual Report of the Patent Office), “The advancement of the arts, from year to year, taxes our credulity, and seems to presage the arrival of that period when human improvement must end.” (Later paraphrased as, “Everything that can be invented has been invented.”)
However, having hedged my bets thoroughly, I will attempt to make a few predictions. First of all, I see computers getting a lot smarter than they are now, to the point that intuitive search systems become a reality instead of a dream. I’m not just talking about “natural-language” searching and “relevance ranking” (both of which are rather a joke for patent searching, if not indeed all sci-tech searching), but vastly increased computer capabilities to translate requests accurately into concepts.
(An aside: What goes around sometimes comes around. Remember the 1955 movie Desk Set, in which Katherine Hepburn plays a reference librarian and Spencer Tracy the efficiency expert who installs a computer (the EMIRAC) in her library? Ten years ago, we sophisticated Boolean search experts laughed at the scene in which Hepburn types a question into the computer, “What is the weight of the world?” Then came the first of the natural language query search engines.)
I also predict a Star Trek-like scenario in our interactions with computers: We will talk to them, they will answer. I cannot imagine what specific developments in computers’ reasoning abilities (artificial intelligence or other) could lead to these capabilities — but that’s not my job. It’s up to the Silicon Valley geniuses to develop these wonderful tools. (See Thuy Ledinh’s contribution for some ideas on this.)
I also predict a linking, a melding, of media. We already see this with current systems that let us access our online databases via an Internet gateway and then link to other information — full documents, cited references, and so on — related to our search results, all on the same platform. This will grow vastly in both resources and speed. I really hope to see my “virtual patent office” before I retire, in which I can search the most sophisticated patent databases (preferably all together, with their various indexing merged), go immediately to full patents of interest and flip through their pages online as fast as I could in paper copy, link to cited and citing patents and literature references, perform complex statistical analyses and comparisons of competitors’ holdings with each other’s and our own — and, while we’re at it, get instantaneous translations of anything in other languages. (Well, maybe not before I retire.)
I also predict considerable advances in capabilities to search more than just text and indexing. We will be able to search chemical structures, mechanical drawings, data in tables and graphs — all the genuine knowledge in a document. (See Bob Buntrock’s companion sidebar for more details on this.)
I am a bit more fuzzy on the future of searching information on outside resources versus storing it in-house. On the one hand, computer storage capacity continues to grow very quickly. To be sure, it has a long way to go before an in-house server could store, for instance, the collection of patent databases now available on QuestelOrbit or the collection of chemical databases on STN — and information professionals, as I mentioned, need integrated access to multiple databases. Still, some information products have already emerged for this sort of in-house storage and use, and customers are buying them for their speed of processing, security, and ease of sharing in-house. On the other hand, Internet resources and usage have grown at a rate beyond the wildest predictions, to the point that virtual communities are forming. I will stick my neck out and predict that the future will be large worldwide networked resources rather than large in-house resources. (I also predict that some day we will be able to get through our companies’ !@#$%^& firewalls with less than glacial speed! I hope.)
People have been
predicting “disintermediation,” the demise of the professional search intermediary,
ever since the advent of the Internet. It hasn’t happened, and it won’t
happen for a long time in sci-tech and especially patent searching. For
one thing, no matter how good new resources are, it will be a long time
before technological developments make feasible (and affordable) the reindexing
of old documents with new indexing systems. But old documents are as important
to prior art searches as new ones. So not only must searchers know all
the best new indexing systems, they have to know all the old clumsy ones,
too — and they must know how to do back searches that incorporate all the
right indexing across all the appropriate time periods. (See Edlyn Simmons’
contribution for more details on this.) Not until friendly computers can
understand and look for just what their users want, even when the users
don’t know how to ask the right questions, will professional searchers
One lesson we have learned is that no matter how far out and outlandish our predictions about computer capabilities may seem, they won’t be as far out as reality. Change will continue to happen — big change. I have worked in patent information for just 25 years, and I have seen information retrieval change from digging in books and searching computer tapes in batch to the modern online and Internet capabilities that myself and others have described in detail. And that’s just the last one-fortieth of this millennium! Neither I nor anyone else can know how we will retrieve patent and sci-tech information 25 years from now — or 50, or 100, or in another millennium. But it will be exciting. I wish I could be there to see it all.
I invited some
others in the patent and sci-tech information community to add their comments.
Here are some of them.
Robert Massie, Director of Chemical Abstracts Services
In the coming years, vast and extraordinarily varied sci-tech data repositories will become available on the Web and its descendants. Many industry participants will try to create what has until recently been only a dream: truly integrated digital sci-tech research environments. These integrated environments may take on increasingly global and pan-scientific characteristics as the cost of linking and data storage continue to decline. All the major publishing conglomerates are pursuing this objective. Somewhat surprisingly, so are certain agencies of the U.S. government, which believe that their missions include providing taxpayer subsidized scientific information on the Web.
In the past, the emphasis for information services was on “value-access.” In other words, providing high quality access to ‘difficult to find or retrieve’ information was the central benefit. In the future, the emphasis for information services will be “value-aggregation”: How well does the service bring together information that is otherwise available on the Web or elsewhere? How much more convenient, valuable, and productivity-enhancing is the totality of the service provided, even if individual information elements are “free” elsewhere on the Web? The most effective and earliest “value-aggregators” will establish strong installed bases in the new Web world and have the opportunity to grow with their customers and take advantage of the continuing information industry evolution.
Thuy Ledinh, Internet Product Manager, QuestelOrbit-France Telecom
Many changes are taking place in the way we organize and search for data, regardless of type and kind. One of the most significant changes is the emergence of practical artificial intelligence (AI) in computer software.
Today we see numerous developments in “natural-language” and “concept” searching. Eventually, these types of developments will give way to a new breed of artificial intelligence that is not entirely based on human mathematical equations, but allows the software to create its own formulas and generate its own set of rules. As result, the smart search will formulate an incredibly thorough search with possibilities that humans have not yet conceived. Smart searching is possible because the software learns from its mistakes and successes. From constructive interactions with the human searcher at first, the smart software will eventually “think” beyond its searcher’s ability because the software now has no human limitation to its “intelligence.”
In essence, patent searching software will become more like a pupil who eventually becomes smarter than the teacher, and much quicker. Take for example a search for “active matrix display” for a laptop computer. The smart search will perform all the steps as done by its searcher, such as looking for additional key words, finding cited, citing, related, and equivalent patents, and so on. However, the additional steps taken would be that the smart search would evaluate its own findings and search again for patents matching the original query/question based on learned and self constructed algorithms. It is the concept that can be summarized as, “Where the humans left off, the software will take over and run with it.”
Harry M. Allcock, Vice-President of IFI Claims Patent Services
Because the U.S. Patent and Trademark Office, the European Patent Office, and the Japanese Patent Office plan to make most of their patent information available to the public at little or no charge, many new entrepreneurs will enter the patent market place. Some will offer a new look at the raw data, and others will repackage the same information under a new title. Current new companies have already caused a price war; and many will not remain in business during the next millennium.
We will continue to see an increase in the number of issued patents due to rapid technological advances. This will result in a growing demand for fast, accurate, and reliable methods of retrieving relevant patent information. Improved search engines and ready access to raw data will meet the needs of some searchers, but intellectual indexing will be even more important for specialized areas such as chemistry. Any company in the patent information area that wants to make a profit must be prepared to offer enhanced, high-quality information. The year 2000 will see several new products designed for business information departments using patents as the source of data. We believe that most serious patent information searchers and intellectual property personnel would be willing to pay a reasonable fee in order to have the best patent information retrieval.
Stuart M. Kaback, Exxon Research and Engineering, winner of the 1999 Skolnik Award for chemical information
I’m not exactly a prophet; it’s not my way to go around foreseeing what’s going to happen in the future. So this is primarily what I want to see happen.
I continue to be deeply concerned that circumstances will conspire to undermine the continuation and future development of value-added databases. It may be satisfactory for people supplying information in non-technical fields to scoop nuggets of information out of a primordial soup, but that does not suffice when it comes to highly technical information, for a number of reasons. It seems clear to me that sci-tech information professionals as a class understand this well, but I continue to worry that circumstances will take control of the future (read: cost-obsessed, non-comprehending, non-technical managers) and starve the funds needed to maintain and improve those databases.
I have no problem with free or inexpensive document delivery through the Internet, and I have no problem with simple search capabilities applied to such things as Web-based patent specification files. I have a major problem with the concept of governments providing upgraded search capabilities free of charge and undermining the viability of organizations that are devoted to providing value-added products. These organizations must be able to retain economic health even to stay in the same place, let alone move forward.
Aside from this central idea, I urge the following:
• Maintenance and establishment of advisory groups and task forces to enable information providers to work out new ideas and to get feedback on quality matters. Large user meetings are valuable and serve an important purpose, but meetings cannot (in my experience, anyway) replace fully the idea-generation that can happen when a compact group of knowledgeable individuals bangs their heads together.
• Database producers need to work out a way to pool high-quality data from input in difficult languages. Once upon a time CAS and Derwent had such an arrangement, which broke down within 2 years. But the amount of material in Japanese, Korean, Chinese, etc., continues to grow, and we are typically provided with inadequate analysis of this information. If high-quality abridgements, or something of the sort, could be prepared and then utilized by the database producers, each in their own fashion, everybody would be better off, at lower cost. And in general, the more difficult the language, the more essential it is that document analysis be thorough. It’s OK to point to a U.S. patent that’s readily available and easy to comprehend. Not so for things in the more difficult languages.
• Substance information in patents would be reported more thoroughly and accurately if an arrangement could be made (at least by the US/JP/EPO/WIPO) to have chemical patents prescreened by CAS under a secrecy contract so that all significant substances would be registered, including new substances, and tags could be applied to these substances.
• In general there is a real need to increase the indexing of trade-named substances, especially in patents, with the names tagged as such, because so many of them are common words that would retrieve garbage without tagging.
• Document analysis must be more flexible, must learn how to adapt to changing times. My chief examples for this relate to patents whose claims reflect the properties of materials with no indication of how those properties might have been achieved, and sometimes not even the examples of the patents will provide this information — it turns out to be in the front end of a patent specification. Analysts must not be hamstrung by laws that require analysis to be based on just the claims, or on just the examples — they must be free to express the significant information in a patent wherever it may appear. I have little doubt that future trends will create new situations; flexibility in reacting to these situations is essential.
• Indexing needs to reflect context, and document analysis needs to reflect the scope of the document. It is inadequate to give a single example in a patent whose examples are wide-ranging and teach multiple ideas; an idea of the scope of the patent is required. And context is essential. Bits of information floating around in an index field with no indication of what goes with what will not suffice. There has been progress in this area; more is needed.
• Indexing of chemical and especially chemical engineering patents needs to be able to distinguish between different sequences. Database producers understand how important sequence information is for genetic patents, but have still never come to grips with sequences in chemical process patents.
• Quality. Consistent, reliable, high quality. Abstracts beset by multiple errors are worse than nothing at all. We must not tolerate poor quality.
Edlyn S. Simmons, Section Head, Business Information Services, the Procter & Gamble Company
At the end of the 20th century we find ourselves with a plethora of systems for searching the chemical structures in patents. The Derwent fragmentation code for non-polymeric structures has code terms applicable from 1963, 1970, 1972, and 1981. The time-ranged fragment coding is searched directly in the bibliographic World Patents Index databases (DWPI). There are two different chemical fragmentation codes for structures in the IFI CLAIMS US patents encoded between 1972 and the present. The IFI fragments must be searched for specific registered compounds in the CLAIMS Reference file and crossed over to the bibliographic UDB and CDB files, where the fragmentation code strategy is searched again for generic structures and infrequently encountered molecules. Chemical Abstracts Registry file has topological indexing of specific compounds, indeed, from patents since 1957, which are crossed over to the bibliographic CA and CAOLD files. Topological indexing of patents published since 1988 are searched directly in the companion MARPAT file. The Questel.Orbit search service offers topological searching with the Markush DARC system of the Merged Markush Service, which contains indexing of patents in the Derwent World Patents Index since 1988 and patent publications indexed by the French Patent Office (INPI) in the PharmSearch file. PharmSearch has chemical structure indexing for patents from 1984 to the present for U.S., EPO, PCT, and French patents, with additional indexing for French medicinal patents published from 1961 to 1978; it is the only database applying current indexing retrospectively. Derwent, in addition to the ongoing indexing of patents with the chemical fragment code and Markush DARC, has recently created the Derwent Compound Registry, which allows topological searching through either Markush DARC or the STN Messenger search system. A complete retrospective search should include all of these as well as a manual search of the older chemical compounds indexed in Chemical Abstracts and a topological search of Beilstein.
Learning the many systems for chemical structure retrieval is a daunting task for novice patent searchers, but patent searching is one area where backfiles cannot be ignored. To make matters worse, some of the database producers are training new staff members to use only the active search systems, thereby limiting the availability of trainers and help-desk assistance, while experienced searchers are growing older, moving to managerial positions, and retiring. At the same time, the high cost of searching all of the systems must be balanced against the increasing cost of other electronic information sources. It is difficult to imagine that the current state of chemical structure searching can continue indefinitely. But what will replace it?
Several possibilities come to mind. Pessimists might foresee the Dark Ages of patent searching, when the high cost of searching sophisticated databases, the ready availability of free, full-text databases on the Internet, and the lack of expertise among searchers cause many companies to forego chemical structure searching entirely. In this scenario, database producers would be forced to lay off expensive indexing staffs, and the use of the chemical structure search systems would be forgotten by all but a few companies running in-house searches of discontinued databases on aging mainframe computers and obsolete PCs. (Imagine a monastery where cowled patent searchers work by candlelight, scratching out searches with quilled Pentiums.) Optimists might foresee a bright future when artificial intelligence programs will allow searching of all of the backfiles of all of the existing databases through a single, simple, topological, front-end interface. Realists can hope for a more efficient Markush TOPFRAG program to convert topological input into a reliable Derwent fragmentation code strategy, a similar topological interface for the IFI files, and conversion of the early Chemical Abstracts Formula and General Subject indexes into connection tables for addition to the Registry file.
Elliott Linder, formerly API Indexing Manager, now of GCI Information Services
Text search engines and natural language-processing: The “importance” of any information is relative and dependant on one’s perspective. What the creator or publisher thought was important may not be what the user finds to be important. I look for a shift away from efforts to determine statistically how important units of information are to a user (i.e., relevance ranking) toward more benign and helpful efforts to determine what things mean and are about. Such efforts can lead to enhanced retrieval through an evolving standardization and consistency of subsurface concept representation and matching.
Retrieval links: Don’t be surprised to see growing numbers of links from retrieved data sets and individual data elements to a wide variety of related and (hopefully) useful options far beyond just the full text of the retrieved items. These links will begin to appear even in traditionally closed retrieval systems.
Publishing and intellectual property: This will be a difficult area, ripe for litigation and new case law. What will constitute publication? Ownership? Will people come to feel secure with only an electronic record of their activities?
Perhaps, by the end of the new millennium, human indexing will no longer be required — and I can finally rest easy.
Ruth Umfleet, Celanese
Computers will convert searches from verbal descriptions to search queries. Much later in the century, computers will interrogate search requesters to formulate queries. For instance, we will speak a chemical name and be asked to select exact substructure, etc. searches.
Automated indexing will add conceptual and specific terms automatically to a document’s indexing — some of this is already reported.
Individual databases and Internet sites will become more transparent. They will be answer-oriented. They will search specific databases — coding automatically when formulating queries.
The professional searcher’s future: Searchers will have to adjust their expertise to accommodate the new technologies. They will continue to update their professional skills and be “technology flexible.” Information professionals will build the “behind the scenes” definitions and framework to allow conceptual and thesaurus searching. Searchers will continue to have the best expertise and understanding of the new systems and will still be called upon to do complicated searches, but end users will be able to do much more for themselves and end up with better search results.
Jackie C. Shane, Patent and Trademark Librarian, Centennial Science and Engineering Library, University of New Mexico
In the next millennium we will see a difference in the way patent applications are filed, and we will see a shift in the types of patents that are filed. The USPTO’s decision to load much of its patent and trademark databases on the Internet means that for the first time in history, the public has free access to patents back to 1976 and all registered and pending trademarks. The new millennium will see more people than ever before searching prior art on their own and applying for patents on their own.
With the advent of all this pro se legal activity, the new millennium will see more people than ever employing sloppy search techniques and mailing in incomplete applications. Those willing to read documentation will succeed; those wanting things done quickly and cheaply will get back from the process what they have invested.
The USPTO will scramble to stay abreast of new technologies and to find appropriate labels for them. Applications will be filed electronically and will be more easily tracked. The Manual of Classification will probably shrink a bit in mechanical inventions and explode in areas of software and biotechnology.
Patent examiners will need to face ethical decisions when examining art pertaining to genetic engineering and biotechnology, decisions that may have ramifications for yet another millennium to come. The flip side is that as natural resources dwindle, the most savvy engineers will create technologies that rely on sustainable or renewable energy sources.
What is most certain
is that the direction of the first part of the new millennium will decide
the fate of those living in the latter part.
Robert E. Buntrock — Buntrock and Associates
“Better Mousetrap” guest columnist
Out of necessity, registries will continue to expand in importance and utility. As the largest and most comprehensive, the CAS Registry System (CASRS) will continue to maintain its dominance in this area. However, more widespread or even total and exclusive use of the CASRS will create some associated problems (as outlined in several publications by Buntrock: DATABASE 1994, 1995, 1996; to appear in J. Chem. Inf. Comput. Sci.)
of chemical compounds in the public and popular arenas is inadequate at
best and highly erroneous at worst, it’s debatable whether or not CAS Registry
Numbers (CASRN) should be used universally as required descriptors of chemical
compounds. Unless CASRN are actually assigned as part of the CAS indexing
process in preparation of its own databases, a significant degree of error
To enable the practical use of such services, advances must be made in pattern recognition in 3-D with implied rotation and drawing modification. Adequate representation of secondary structure of polymers still needs development, and the situation is even less advanced for tertiary structures. Recent advances in molecular modeling must be translated into database building.
Just as a variety
of proprietary connection tables complicates the representation of 2-D
structures, a proliferation of proprietary systems will complicate further
advances in searching structures and drawings. For example, MDL developed
an excellent system for representing and searching polymers a decade ago,
but to my knowledge, the capability still resides solely within MDL-created
systems, typically in-house and proprietary.
Full-text searching tools must continue to improve. I don’t believe a full-fledged version of a Salton-inspired, relevancy-inherent, full-text searching system (a la SMART or SQUIRE) exists outside of a few in-house systems. If so, the providers are being very secretive and tight-lipped. However, full-text searching, even with superior search systems, will never satisfy all searching needs for sci-tech information, especially for patents.
Although it’s sparse
compared to other forms of information, the amount of meaningful chemical
information on the Web will continue to expand. The main roadblocks are
paying the costs inherent in preparation and maintenance of this complicated
and unique body of information. Users will have to get used to paying for
high-quality chemical information on the Web. I don’t see how subsidies
or advertising can pay the bills for any length of time.
This attitude is
rampant in academia as well as in industry. It’s all too easy for academic
research directors to use grant money allocated for information for other
expenses, especially if they feel they can “get by” for less. As academic
research becomes more and more commercialized (witness the proliferation
of university P&L departments and non-disclosure agreements), those
researchers will soon hit the brick wall and face the necessity of acquiring
good information at retail prices, rather than the wholesale costs they’re
used to paying.