Full Text in the Humanities and Social Sciences
& Jan Voogd, Littauer Library, Harvard College Library
Jan, however, whole-heartedly embraced the move to virtual text, bypassing physical preservation issues (having spent the past year sending 11,663 items to off-site storage for safekeeping). In her experience, while not perfect, the databases were ultimately superior to the alternatives — decomposing print or not owning the title.
The following text is a loose conversation we had over a week’s time.
Amy: I’ve been thinking about the discussion that we started the other day concerning full-text humanities and social sciences databases. The humanities have a strong, rich history on the Web. Full-text books, out of copyright and in the public domain, were among the earliest inhabitants of the Internet. Shakespeare, Milton, Dante, and hundreds of other classical texts found their way into digital concordances years ago. What is happening now is the commercialization of what once appeared to be a bibliophile’s business.
The infiltration of such tools has totally changed the way we approach librarianship. Reference, collection development, technical support, library instruction, technical services (acquisitions, cataloging, serials management) all have been affected by digital information. I cannot imagine working in a library that did not have access to the depth and breadth of information that we now have at every computer terminal. However, this digital improvement is not yet perfect. While I hope never to use a card catalog again, I cannot accept all digital resources without criticism.
As UMI, Chadwyck-Healy, and Bell & Howell all become one company, the “corporatization” of the holdings puts the access to knowledge at risk. While I celebrate the richness of titles now available “24/7” and throughout the world, I can’t help but wonder if the superior delivery medium doesn’t limit our student’s research to what titles they can access immediately — via their modems. I see so many problems with the commercialization of information and the unevenness of distribution — not to mention second-rate proprietary software that promises easy printing and searching, but delivers a difficult retrieval experience. We’ve been working with companies eating other companies — amalgamating their holdings — until we, as librarians, have no leverage when comparing and contrasting contracts.
Remind me why I should be excited about access when I am beginning to feel powerless over my ability to master digital holdings and search engines, as well as over my institution’s ability to offer any sort of continuity of service.
Jan: My conviction is that the commercialization of these databases is a very good thing. Through commercialization, we achieve efficient ease of use, and ease of use allows the content to transcend the format. Once upon a time only a few people were privy to the written intellectual wealth of humanity. As time has gone on, that has continued to change, thanks first perhaps to Gutenberg, until today, when we see before us the real possibility that any idea someone has ever managed to express in written form might be available to anyone sitting in front of a computer terminal. The implications of this are endless, but at the very least it means that the greatest living minds can reach all thinking people, no matter what their economic reality (thanks to public libraries). We will all have access to all the greatest thinking so far, and this will benefit us all.
Full-text databases may easily follow this model. I could point to numerous examples. Your fear of big business is not unwarranted, but the fact remains that commercial interests must consider the needs and wants of the consumers themselves or they will not survive. I believe that we can trust the market. Sure, some behemoths, like BHIL, may sometimes behave like sticks in the mud when it comes to advancing access technology and overprice. But the beauty of technology today is that challenges to the Goliaths of digital information await only the adept and ingenious Davids armed with the slingshots of their revolutionary ideas. The whole picture can change in a moment, and this is something to celebrate, not something to fear.
Amy: I am not against the commercialization of information. God knows it is much too late in the game to be concerned with profit for knowledge. Also, I would be foolish to imagine that the great steps we have made in full-text information would have happened in the absence of the dollar sign.
What I worry about the most with commercial databases is poor design or the false barriers designed to protect intellectual property but that make it more difficult for students to do their work. We often let full-text labors of love get away with a less polished look, but when I have to spend a good percentage of my budget on an online title (one which I may also have to continue paying for in print), I expect and demand a well designed and a better product.
such as the JSTOR project, began before the technology was ready to support
the vision. This is not to say that JSTOR isn’t wildly successful; the
content is too powerful for the programming to torpedo its market share.
[Note to our readers: JSTOR, according to its Web page, is “a not-for-profit organization initially funded by the Andrew W. Mellon Foundation, dedicated to the development of a digital library in support of the arts and sciences. JSTOR consists of more than a dozen journal titles in varied topics (from ethnology to economics) and will initially contain approximately 750,000 journal page images. JSTOR allows browsing and full-text searching of the journals.” JSTOR was built on the backs and stacks of libraries that offered their collections (at times sacrificing decades of journals that JSTOR staff ripped apart page-by-page) to assemble an amazing digital collection of core journals that are never lost, never checked out.]So, what’s my problem? I’m nit-picking really, when I express concern about the loss of paper to an unstable computer medium. I have the battle scars from the first 3 years of JSTOR, helping students print from an application that seemed to fail more often than it worked. I helped students work around the fact that they cannot cut and paste text from the digital images on the screen (a common problem with many for-profit, full-text titles on the Web). Bottom line? I worked in a library that donated its journals to JSTOR and received, in trade, an unstable database.
These may be petty complaints against an amazingly successful enterprise, but it is an enterprise run by librarians closer to the distribution and use of knowledge than almost anyone. People have become much more Web savvy, better able to jump through all the hoops of the various Web sites, but at the beginning of JSTOR, the printing and the PDF format were less than friendly.
Another truly powerful tool with wonderful content but difficult interfaces and search tools is BHIL’s Dissertation Abstracts/Digital Dissertations. OK, my bias is such that I blanched when I saw a “shopping cart” feature morph into a research tool. In the past few months, however, I have seen the same tool used in a library catalog — so I guess I better get used to it.
BHIL, to their credit, has improved its product. Formerly, one had to request the full-text dissertations via the convoluted Web page. After transmitting a request, one would receive an e-mail message with a URL and password that would allow the user to see the full-text. It used to take a few hours to get this info. Now the process has been streamlined, and the user gets the URL and password immediately with only a few minutes needed to give the system time to set up the titles. I requested six dissertations and got them within 5 minutes. Unfortunately, when I returned to the Digital Dissertation site the next day, it was down for routine maintenance. This was at 10 A.M. on a weekday.
Jan: I’m so glad you mentioned JSTOR and the problems you see with it, because each of the “problems” you describe seem like solutions to me. Copyright holders worry whether digital forms of information will let their work be adopted, adapted, and passed off as the work of someone else. The fact that you and your students can’t cut and paste text, or send the text off in e-mail, is a good thing. This way the digital format is as safe as any other format, with the added benefit of being so much more easily accessed. And JSTOR has certainly fixed whatever printing bugs it once had, because what I see in my hands are crisp, clear printouts of articles, for which I otherwise would have had to trek across the campus or the country, or would have had to find in microform.
When I think of the eliminated footwork, I rejoice. The fruits of scholarship will no longer be dependent on a person’s penchant for dust or their affection for writing on index cards. Instead, everyone will have readily available in digital form at their desktop the intellectual harvest of the centuries, and we can simply build upon it, rather than wasting valuable time and energy mired in the muck of earthly print.
Amy: Jan, oh optimistic one! OK, OK, yes, you are right! I am not arguing that JSTOR and other full-text databases are not fabulous tools. It is simply the arbitrariness of the rules of use. After all, some of these virtual collections stretch back a century or more, but JSTOR charges the same price regardless of how long dead the authors are or when their copyrights ran out.
You and the businesses who “own” this information share the fiction that believes that the author is God and should have ultimate control over all s/he has published. In actuality it is the content provider who is God and controls everything. Web publishing and post-modern authorship is challenging every aspect of the “text” as we know it.
For example, Bell and Howell has Early English Books Online, a rich and extremely useful tool for anybody in the humanities. This is a collection of many of BHIL’s microform collections moved onto the Web. Bell & Howell (having acquired UMI and Chadwyck-Healy) are in the unique position of bringing many of our standard collections to the digital world.
My issue is this: These are not newly acquired, had-to-be-paid-for titles. Every title has been owned for decades. The titles are all out of copyright; the only costs involved are making the text accessible and digital. Not that these start-up costs are negligible, they are high — but not as high as publishing new works and paying authors.
Granted, content providers need some sort of content protection, otherwise they would not invest in the technology that makes these offerings possible. But is it necessary to use yet another proprietary software to read text, like DjVu, the widely disliked format BHIL insists we use in the Early English Books Online service? It is a slow-loading, awkward tool disliked by most people who have used it. And why should our users have to learn another tool (thanks to AT&T) when PDF files have proven a safe standard, though still not my favorite?
It is as awkward as having to visit Digital Dissertations within a specified time period at a different URL with an assigned password! Reference librarians, and the researchers who love them, do not need nor appreciate these kinds of challenges.
Jan: So, if I understand you, it’s not the digital format or the commercial structure that bother you, but the varied nature of the proprietary software that hosts the information, some software being better than other software. My question to you would then be: Were it all to be uniform, which would it be? One librarian’s awkward interface might be another librarian’s dream.
I believe we are lucky to have so many different choices. If I can get census information from Geolytics or from QueryLogic, you can bet I have a preference, and you can also bet that whichever I prefer, another colleague will prefer the other. Census information has no copyright, so sure, the information is actually free, but to make the data convenient to use, I need these companies with their proprietary software, and I’m glad there’s more than one to choose from. The more competition there is, the more innovation and improvement we will see. Just ask the Justice Department.
In this New World of digital information, it behooves us all to learn agility, the ability to jump from one proprietary software interface to another. We should strive to become like the sophisticated world traveler who can speak many languages and learn new ones with ease — success comes with learning how to learn, not in wasting time resisting.
Amy: OK, so we can’t have the same infrastructure for most of our databases. I agree — it won’t happen. But having said that, let us examine our own collection development policies. For the Harvard University Library Web Page, HOLLIS Plus [http://hplus.harvard.edu] we moved many databases to the OVID interface. Was it superior to other interfaces? No, not always. Is it the fastest? Seldom. What OVID does have is many, many titles available that our students use. We chose to pick a common interface for our user’s ease over the absolute best searching interface. With a common interface, I can search ABI Inform, the MLA (Modern Language Association) database, or Medline and quickly understand exactly where I am in my search. This is considered a good thing in the world of interdisciplinary studies.
Let me go back to BHIL as an example. Their different titles search in vastly different ways. Digital Dissertations does not look anything like the Early English Books Online database. UMI also owns ProQuest Direct, a full-text newspaper and magazine database. ProQuest Direct is fairly sizable, over 2,000 titles. It allows for citation, full text, or full text with graphics. The text is immediately available and very simple to use. Why can’t UMI use the same interface for Digital Dissertations and Early English Books Online? Does the company not understand how it would strengthen its market position to offer continuity, simplicity, and quality in all of its databases?
That, Jan, is what I seek. I would like for vendors to consider their users, all their users — the students, the scholars, and the librarians who teach the students and scholars how to use, access, and integrate electronic formats into their work. True, many users can move quickly amongst different interfaces, especially the younger students. Still, I can’t help but wonder if there is some saturation point where the ability to search deeply and with sophistication suffers with so many different designs?
you ask for is coming true, even as we speak. Many of the companies you’ve
mentioned spend a great deal of time and money communicating with librarians
and other users, hosting focus groups and sessions at conferences, asking
for feedback, surveying researchers, and so on. They do try to give everyone
what they want, because it’s just plain good for business. Of course companies
aren’t perfect, they don’t always hear and understand, they sometimes cut
corners to save costs, and they are ultimately operated by fallible human
beings, but I don’t think you have anything to worry about!
|A Sampler of Full-Text Humanities
and Social Science Databases
Having argued dissatisfaction with several costlier databases, this selection highlights a few of the full-text humanities titles available for free. These are titles that are put into the public realm by people — usually attached to universities — who have a strong passion for the subject matter. The searching interfaces are pretty darned sophisticated, considering the price to the user. All of these databases allow the user to search (even to use the database as a concordance), e-mail, download, print, and copy. In fact, these databases offer everything some would like our more sophisticated (i.e., costlier) tools to accomplish.
This is a sample gathering of full-text resources, some of my favorites. As we must acknowledge, it is all but impossible to put together a comprehensive list of anything on the Web.
Alex Catalog of Electronic Texts
Sponsored by the Berkeley Digital Library SunSITE project, this collection features American literature, English literature, and Western philosophy. Its strengths lie in the fact that one can search for specific texts from the main search page and search within the texts. Another bonus is that one can search the contents of multiple documents simultaneously. This makes for a deep-reaching tool that makes critical and comparative research easier. The technology of this site is very sophisticated and innovative. Alex deserves a look.
on the Net
The Online Medieval
and Classical Library (OMACL)