The Unknown Known
by Barbara Quint
On Feb. 12, 2002, Secretary of Defense Donald Rumsfeld
uttered the immortal words, "As we know, there are known
knowns. These are things we know we know. We also know
there are known unknowns. That is to say, we know there
are some things we do not know. But there are also unknown
unknowns, the ones we don't know we don't know." As a
statement by a public official speaking in the full power
of his officialdom, this quotation seems eminently public
However, a gentleman named Hart Seely recast these
immortal words into a poem called "The Unknown," which
became a portion of his edited compilation, Pieces
of Intelligence: The Existential Poetry of Donald Rumsfeld
(Free Press, 2003, ISBN: 0743255976). Since then, Bryant
Kong, a pianist with his own label, Stuffed Penguin
has set a selection of the "poems" to music in The
Poetry of Donald Rumsfeld. How do I know all this?
Google led me to an NPR article/transcript/audio story
covering it. At the end of the story, NPR thanked Mr.
Seely for granting them permission to publish a selection
of the Rumsfeld poems. The press thanking a private
citizen for permission to publish public domain statements
by politicians? What has the world come to?
But let's set aside copyright and permission concerns
for now. The issue that jumps quickly to the eye of
professional searchers is defining the document. From
the days of Melvyl Dewey and probably far earlier, the
one thing that all librarians or information professionals
knew they could do is to find a known item. Give us
an author's name, an approximate date, a source indicator
(journal title, book publisher, conference name, even
author affiliation), or any combination of those components,
and like those dogs that catch Frisbees
we leap into pursuit. That universal basic skill is
one of the things we guarantee our patrons. And the
drive to perform this function often justifies the budgeting
for abstracting and indexing services and full-text
But what exactly are we panting pooches chasing these
days? All information starts or ends its life as digital
now, and most digital information seems to end up on
the Web in some form or another. The Web on which digital
information resides extends across a broad range of
accessibility from the wide open Web, complete with
full and frequent spidering by Google et al., to the
so-called invisible or deep Web to the proprietary "ching-ching,
money please!" controlled access venues and, finally,
to the covert but still connected intranet corridors.
Out of digital data, people still create the more traditional
forms, such as printed books and periodicals, which,
in turn, often go back into digital aggregations.
More boggling and more challenging for searchers,
however, are all the changes and alterations, enrichments
and deletions, that can take place as content moves
from point to point. For example, in the area of scholarly
publication, how do you define, much less produce, a
final document with the rise of open access? Many open
access advocates vigorously endorse self-archiving as
the primary approach to freeing scholarship from the
prison of traditional publishers' hands. Self-archiving
what? The original article submitted to the publisher?
The article before or after peer review, before or after
editing, before or after fact-checking? And what about
articles that go straight from the author to a peer-review
process (or not) and out to readers without any formal
publishing intermediation? Where do we find all these
self-archived articles? Most of the site names might
require that people have already identified the appropriate
authors. What tools do we have to find the authors based
on full-text searches of the content? When we get past
Yahoo! Search's plumbing of OAister, are we on our own?
And who are authors these days? A recent article in
the Journal of the American Society of Information
Science and Technology (JASIST) bemoaned the blurring
of authorship as collaborative research allowed for
more "honorific" authors. (Not that this would be anything
new to doctoral candidates with renowned professors'
names on their work.) Proposed solutions to the problem
involved authors identifying contributions by percentages
or even paragraphs. Great! Now "Get me copies of X's
works" means hunting for paragraphs! And with an open
peer-review process, do the comments of reviewers become
part of the author- ship chain? Should listserv threads
or forum discussions become part of the content corpus?
Back to self-archiving issues. How will they affect
searching by institutional authorship? We know we must
check through institutional repositories to find individual
author collections. But do we also have to hunt out
"off-campus" self-archiving sites to complete research
on work produced at an institution?
As the bottomless newspit of the Web continues to
encourage a Niagara of input, new content flows emerge.
Authors can now attach collections of research data
to support their articles. In some cases, they may have
no choice. New government regulations in force or under
consideration have begun mandating the filing on the
Web of clinical trial and pharmaceutical testing data.
So now "Get me that study on ... by ..." could mean
tracing down not only the final report whatever
that is but also the data connected to the study.
Are we having fun yet?
Where are the tools to help us through this morass?
If you expect traditional publishers or the traditional
database industry to rise to the challenge, call me
cynical or call me experienced but I fear
you will have a long wait. In fact, some of those folks
will probably view this jeremiad as a justification
for returning to the ways of millenniums gone by. But
those days are gone forever. And rightly so. Rightly
What are we talking about here? Problems due to new
content and new readership and new distribution channels.
These are problems, absolutely, but these are good problems.
These are problems worth investing the sweat equity
it will take to solve them. These are problems for people
who want to make a wiser world, a world where scholarship
and knowledge moves faster into the eyes and minds of
those who need it and make it. These are problems worthy
Now what was that you wanted again? Step right up!
Barbara Quint's e-mail
address is email@example.com.