Online KMWorld CRM Media, LLC Streaming Media Inc Faulkner Speech Technology
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Searcher > October 2004
Back Index Forward

Vol. 12 No. 9 — October 2004
The Unknown Known
by Barbara Quint
Editor, Searcher Magazine

On Feb. 12, 2002, Secretary of Defense Donald Rumsfeld uttered the immortal words, "As we know, there are known knowns. These are things we know we know. We also know there are known unknowns. That is to say, we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know." As a statement by a public official speaking in the full power of his officialdom, this quotation seems eminently public domain.

However, a gentleman named Hart Seely recast these immortal words into a poem called "The Unknown," which became a portion of his edited compilation, Pieces of Intelligence: The Existential Poetry of Donald Rumsfeld (Free Press, 2003, ISBN: 0743255976). Since then, Bryant Kong, a pianist with his own label, Stuffed Penguin Music [], has set a selection of the "poems" to music in The Poetry of Donald Rumsfeld. How do I know all this? Google led me to an NPR article/transcript/audio story covering it. At the end of the story, NPR thanked Mr. Seely for granting them permission to publish a selection of the Rumsfeld poems. The press thanking a private citizen for permission to publish public domain statements by politicians? What has the world come to?

But let's set aside copyright and permission concerns for now. The issue that jumps quickly to the eye of professional searchers is defining the document. From the days of Melvyl Dewey and probably far earlier, the one thing that all librarians or information professionals knew they could do is to find a known item. Give us an author's name, an approximate date, a source indicator (journal title, book publisher, conference name, even author affiliation), or any combination of those components, and — like those dogs that catch Frisbees — we leap into pursuit. That universal basic skill is one of the things we guarantee our patrons. And the drive to perform this function often justifies the budgeting for abstracting and indexing services and full-text digital aggregations.

But what exactly are we panting pooches chasing these days? All information starts or ends its life as digital now, and most digital information seems to end up on the Web in some form or another. The Web on which digital information resides extends across a broad range of accessibility from the wide open Web, complete with full and frequent spidering by Google et al., to the so-called invisible or deep Web to the proprietary "ching-ching, money please!" controlled access venues and, finally, to the covert but still connected intranet corridors. Out of digital data, people still create the more traditional forms, such as printed books and periodicals, which, in turn, often go back into digital aggregations.

More boggling and more challenging for searchers, however, are all the changes and alterations, enrichments and deletions, that can take place as content moves from point to point. For example, in the area of scholarly publication, how do you define, much less produce, a final document with the rise of open access? Many open access advocates vigorously endorse self-archiving as the primary approach to freeing scholarship from the prison of traditional publishers' hands. Self-archiving what? The original article submitted to the publisher? The article before or after peer review, before or after editing, before or after fact-checking? And what about articles that go straight from the author to a peer-review process (or not) and out to readers without any formal publishing intermediation? Where do we find all these self-archived articles? Most of the site names might require that people have already identified the appropriate authors. What tools do we have to find the authors based on full-text searches of the content? When we get past Yahoo! Search's plumbing of OAister, are we on our own?

And who are authors these days? A recent article in the Journal of the American Society of Information Science and Technology (JASIST) bemoaned the blurring of authorship as collaborative research allowed for more "honorific" authors. (Not that this would be anything new to doctoral candidates with renowned professors' names on their work.) Proposed solutions to the problem involved authors identifying contributions by percentages or even paragraphs. Great! Now "Get me copies of X's works" means hunting for paragraphs! And with an open peer-review process, do the comments of reviewers become part of the author- ship chain? Should listserv threads or forum discussions become part of the content corpus? Back to self-archiving issues. How will they affect searching by institutional authorship? We know we must check through institutional repositories to find individual author collections. But do we also have to hunt out "off-campus" self-archiving sites to complete research on work produced at an institution?

As the bottomless newspit of the Web continues to encourage a Niagara of input, new content flows emerge. Authors can now attach collections of research data to support their articles. In some cases, they may have no choice. New government regulations in force or under consideration have begun mandating the filing on the Web of clinical trial and pharmaceutical testing data. So now "Get me that study on ... by ..." could mean tracing down not only the final report — whatever that is — but also the data connected to the study.

Are we having fun yet?

Where are the tools to help us through this morass? If you expect traditional publishers or the traditional database industry to rise to the challenge, call me cynical — or call me experienced — but I fear you will have a long wait. In fact, some of those folks will probably view this jeremiad as a justification for returning to the ways of millenniums gone by. But those days are gone forever. And rightly so. Rightly so!

What are we talking about here? Problems due to new content and new readership and new distribution channels. These are problems, absolutely, but these are good problems. These are problems worth investing the sweat equity it will take to solve them. These are problems for people who want to make a wiser world, a world where scholarship and knowledge moves faster into the eyes and minds of those who need it and make it. These are problems worthy of Searchers.

Now what was that you wanted again? Step right up!

Barbara Quint's e-mail address is
       Back to top