Virtual Academy: Full Text in the Humanities and Social Sciences

Vol.8, No. 8 • Sept. 2000

• VIRTUAL ACADEMY •
Full Text in the Humanities and Social Sciences by Amy M. Kautzman, Harvard College Library, Harvard University
& Jan Voogd, Littauer Library, Harvard College Library

It is the end of the fiscal year. This used to mean that we librarians went into a frenzy, quickly spending all of our leftover funds on items tangible and finite in appearance and cost, items like books and CD-ROMs. This year, the two of us — Jan, the Collection Management Librarian, and Amy, the Head of Reference and Instruction — found ourselves agreeing, in theory, that full-text online resources would make wonderful additions to our collection. Yet, Amy felt frustrated over how many for-profit companies present materials. As a teacher and researcher, she felt that the databases failed users in many ways, that companies were putting profit over common-sense design elements.

Jan, however, whole-heartedly embraced the move to virtual text, bypassing physical preservation issues (having spent the past year sending 11,663 items to off-site storage for safekeeping). In her experience, while not perfect, the databases were ultimately superior to the alternatives — decomposing print or not owning the title.

The following text is a loose conversation we had over a week’s time.

Amy: I’ve been thinking about the discussion that we started the other day concerning full-text humanities and social sciences databases. The humanities have a strong, rich history on the Web. Full-text books, out of copyright and in the public domain, were among the earliest inhabitants of the Internet. Shakespeare, Milton, Dante, and hundreds of other classical texts found their way into digital concordances years ago. What is happening now is the commercialization of what once appeared to be a bibliophile’s business.

The infiltration of such tools has totally changed the way we approach librarianship. Reference, collection development, technical support, library instruction, technical services (acquisitions, cataloging, serials management) all have been affected by digital information. I cannot imagine working in a library that did not have access to the depth and breadth of information that we now have at every computer terminal. However, this digital improvement is not yet perfect. While I hope never to use a card catalog again, I cannot accept all digital resources without criticism.

As UMI, Chadwyck-Healy, and Bell & Howell all become one company, the “corporatization” of the holdings puts the access to knowledge at risk. While I celebrate the richness of titles now available “24/7” and throughout the world, I can’t help but wonder if the superior delivery medium doesn’t limit our student’s research to what titles they can access immediately — via their modems. I see so many problems with the commercialization of information and the unevenness of distribution — not to mention second-rate proprietary software that promises easy printing and searching, but delivers a difficult retrieval experience. We’ve been working with companies eating other companies — amalgamating their holdings — until we, as librarians, have no leverage when comparing and contrasting contracts.

Remind me why I should be excited about access when I am beginning to feel powerless over my ability to master digital holdings and search engines, as well as over my institution’s ability to offer any sort of continuity of service.

Jan: My conviction is that the commercialization of these databases is a very good thing. Through commercialization, we achieve efficient ease of use, and ease of use allows the content to transcend the format. Once upon a time only a few people were privy to the written intellectual wealth of humanity. As time has gone on, that has continued to change, thanks first perhaps to Gutenberg, until today, when we see before us the real possibility that any idea someone has ever managed to express in written form might be available to anyone sitting in front of a computer terminal. The implications of this are endless, but at the very least it means that the greatest living minds can reach all thinking people, no matter what their economic reality (thanks to public libraries). We will all have access to all the greatest thinking so far, and this will benefit us all.

Full-text databases may easily follow this model. I could point to numerous examples. Your fear of big business is not unwarranted, but the fact remains that commercial interests must consider the needs and wants of the consumers themselves or they will not survive. I believe that we can trust the market. Sure, some behemoths, like BHIL, may sometimes behave like sticks in the mud when it comes to advancing access technology and overprice. But the beauty of technology today is that challenges to the Goliaths of digital information await only the adept and ingenious Davids armed with the slingshots of their revolutionary ideas. The whole picture can change in a moment, and this is something to celebrate, not something to fear.

Amy: I am not against the commercialization of information. God knows it is much too late in the game to be concerned with profit for knowledge. Also, I would be foolish to imagine that the great steps we have made in full-text information would have happened in the absence of the dollar sign.

What I worry about the most with commercial databases is poor design or the false barriers designed to protect intellectual property but that make it more difficult for students to do their work. We often let full-text labors of love get away with a less polished look, but when I have to spend a good percentage of my budget on an online title (one which I may also have to continue paying for in print), I expect and demand a well designed and a better product.

Some programs, such as the JSTOR project, began before the technology was ready to support the vision. This is not to say that JSTOR isn’t wildly successful; the content is too powerful for the programming to torpedo its market share.

[Note to our readers: JSTOR, according to its Web page, is “a not-for-profit organization initially funded by the Andrew W. Mellon Foundation, dedicated to the development of a digital library in support of the arts and sciences. JSTOR consists of more than a dozen journal titles in varied topics (from ethnology to economics) and will initially contain approximately 750,000 journal page images. JSTOR allows browsing and full-text searching of the journals.” JSTOR was built on the backs and stacks of libraries that offered their collections (at times sacrificing decades of journals that JSTOR staff ripped apart page-by-page) to assemble an amazing digital collection of core journals that are never lost, never checked out.]

So, what’s my problem? I’m nit-picking really, when I express concern about the loss of paper to an unstable computer medium. I have the battle scars from the first 3 years of JSTOR, helping students print from an application that seemed to fail more often than it worked. I helped students work around the fact that they cannot cut and paste text from the digital images on the screen (a common problem with many for-profit, full-text titles on the Web). Bottom line? I worked in a library that donated its journals to JSTOR and received, in trade, an unstable database.

These may be petty complaints against an amazingly successful enterprise, but it is an enterprise run by librarians closer to the distribution and use of knowledge than almost anyone. People have become much more Web savvy, better able to jump through all the hoops of the various Web sites, but at the beginning of JSTOR, the printing and the PDF format were less than friendly.

Another truly powerful tool with wonderful content but difficult interfaces and search tools is BHIL’s Dissertation Abstracts/Digital Dissertations. OK, my bias is such that I blanched when I saw a “shopping cart” feature morph into a research tool. In the past few months, however, I have seen the same tool used in a library catalog — so I guess I better get used to it.

BHIL, to their credit, has improved its product. Formerly, one had to request the full-text dissertations via the convoluted Web page. After transmitting a request, one would receive an e-mail message with a URL and password that would allow the user to see the full-text. It used to take a few hours to get this info. Now the process has been streamlined, and the user gets the URL and password immediately with only a few minutes needed to give the system time to set up the titles. I requested six dissertations and got them within 5 minutes. Unfortunately, when I returned to the Digital Dissertation site the next day, it was down for routine maintenance. This was at 10 A.M. on a weekday.

Jan: I’m so glad you mentioned JSTOR and the problems you see with it, because each of the “problems” you describe seem like solutions to me. Copyright holders worry whether digital forms of information will let their work be adopted, adapted, and passed off as the work of someone else. The fact that you and your students can’t cut and paste text, or send the text off in e-mail, is a good thing. This way the digital format is as safe as any other format, with the added benefit of being so much more easily accessed. And JSTOR has certainly fixed whatever printing bugs it once had, because what I see in my hands are crisp, clear printouts of articles, for which I otherwise would have had to trek across the campus or the country, or would have had to find in microform.

When I think of the eliminated footwork, I rejoice. The fruits of scholarship will no longer be dependent on a person’s penchant for dust or their affection for writing on index cards. Instead, everyone will have readily available in digital form at their desktop the intellectual harvest of the centuries, and we can simply build upon it, rather than wasting valuable time and energy mired in the muck of earthly print.

Amy: Jan, oh optimistic one! OK, OK, yes, you are right! I am not arguing that JSTOR and other full-text databases are not fabulous tools. It is simply the arbitrariness of the rules of use. After all, some of these virtual collections stretch back a century or more, but JSTOR charges the same price regardless of how long dead the authors are or when their copyrights ran out.

You and the businesses who “own” this information share the fiction that believes that the author is God and should have ultimate control over all s/he has published. In actuality it is the content provider who is God and controls everything. Web publishing and post-modern authorship is challenging every aspect of the “text” as we know it.

For example, Bell and Howell has Early English Books Online, a rich and extremely useful tool for anybody in the humanities. This is a collection of many of BHIL’s microform collections moved onto the Web. Bell & Howell (having acquired UMI and Chadwyck-Healy) are in the unique position of bringing many of our standard collections to the digital world.

My issue is this: These are not newly acquired, had-to-be-paid-for titles. Every title has been owned for decades. The titles are all out of copyright; the only costs involved are making the text accessible and digital. Not that these start-up costs are negligible, they are high — but not as high as publishing new works and paying authors.

Granted, content providers need some sort of content protection, otherwise they would not invest in the technology that makes these offerings possible. But is it necessary to use yet another proprietary software to read text, like DjVu, the widely disliked format BHIL insists we use in the Early English Books Online service? It is a slow-loading, awkward tool disliked by most people who have used it. And why should our users have to learn another tool (thanks to AT&T) when PDF files have proven a safe standard, though still not my favorite?

It is as awkward as having to visit Digital Dissertations within a specified time period at a different URL with an assigned password! Reference librarians, and the researchers who love them, do not need nor appreciate these kinds of challenges.

Jan: So, if I understand you, it’s not the digital format or the commercial structure that bother you, but the varied nature of the proprietary software that hosts the information, some software being better than other software. My question to you would then be: Were it all to be uniform, which would it be? One librarian’s awkward interface might be another librarian’s dream.

I believe we are lucky to have so many different choices. If I can get census information from Geolytics or from QueryLogic, you can bet I have a preference, and you can also bet that whichever I prefer, another colleague will prefer the other. Census information has no copyright, so sure, the information is actually free, but to make the data convenient to use, I need these companies with their proprietary software, and I’m glad there’s more than one to choose from. The more competition there is, the more innovation and improvement we will see. Just ask the Justice Department.

In this New World of digital information, it behooves us all to learn agility, the ability to jump from one proprietary software interface to another. We should strive to become like the sophisticated world traveler who can speak many languages and learn new ones with ease — success comes with learning how to learn, not in wasting time resisting.

Amy: OK, so we can’t have the same infrastructure for most of our databases. I agree — it won’t happen. But having said that, let us examine our own collection development policies. For the Harvard University Library Web Page, HOLLIS Plus [http://hplus.harvard.edu] we moved many databases to the OVID interface. Was it superior to other interfaces? No, not always. Is it the fastest? Seldom. What OVID does have is many, many titles available that our students use. We chose to pick a common interface for our user’s ease over the absolute best searching interface. With a common interface, I can search ABI Inform, the MLA (Modern Language Association) database, or Medline and quickly understand exactly where I am in my search. This is considered a good thing in the world of interdisciplinary studies.

Let me go back to BHIL as an example. Their different titles search in vastly different ways. Digital Dissertations does not look anything like the Early English Books Online database. UMI also owns ProQuest Direct, a full-text newspaper and magazine database. ProQuest Direct is fairly sizable, over 2,000 titles. It allows for citation, full text, or full text with graphics. The text is immediately available and very simple to use. Why can’t UMI use the same interface for Digital Dissertations and Early English Books Online? Does the company not understand how it would strengthen its market position to offer continuity, simplicity, and quality in all of its databases?

That, Jan, is what I seek. I would like for vendors to consider their users, all their users — the students, the scholars, and the librarians who teach the students and scholars how to use, access, and integrate electronic formats into their work. True, many users can move quickly amongst different interfaces, especially the younger students. Still, I can’t help but wonder if there is some saturation point where the ability to search deeply and with sophistication suffers with so many different designs?

Jan: What you ask for is coming true, even as we speak. Many of the companies you’ve mentioned spend a great deal of time and money communicating with librarians and other users, hosting focus groups and sessions at conferences, asking for feedback, surveying researchers, and so on. They do try to give everyone what they want, because it’s just plain good for business. Of course companies aren’t perfect, they don’t always hear and understand, they sometimes cut corners to save costs, and they are ultimately operated by fallible human beings, but I don’t think you have anything to worry about!

A Sampler of Full-Text Humanities and Social Science Databases
Having argued dissatisfaction with several costlier databases, this selection highlights a few of the full-text humanities titles available for free. These are titles that are put into the public realm by people — usually attached to universities — who have a strong passion for the subject matter. The searching interfaces are pretty darned sophisticated, considering the price to the user. All of these databases allow the user to search (even to use the database as a concordance), e-mail, download, print, and copy. In fact, these databases offer everything some would like our more sophisticated (i.e., costlier) tools to accomplish.

This is a sample gathering of full-text resources, some of my favorites. As we must acknowledge, it is all but impossible to put together a comprehensive list of anything on the Web.

—Amy

Alex Catalog of Electronic Texts
http://sunsite.berkeley.edu/alex/
Sponsored by the Berkeley Digital Library SunSITE project, this collection features American literature, English literature, and Western philosophy. Its strengths lie in the fact that one can search for specific texts from the main search page and search within the texts. Another bonus is that one can search the contents of multiple documents simultaneously. This makes for a deep-reaching tool that makes critical and comparative research easier. The technology of this site is very sophisticated and innovative. Alex deserves a look.

Bartleby.com
http://www.bartleby.com/
Featuring a good selection of reference, verse, fiction and nonfiction, Bartleby.com offers free access to titles as well as the option to buy the books. I looked up the 1922 edition of Emily Post’s Etiquette. From one page I could choose chapters, photographic illustrations, selected quotations, and more. At the bottom of the page I found an option to purchase the most recent edition. A smooth, courteous transition to business. I like this site for what it is — a business ploy that offers real value to its users.

Bibliomania
http://www.bibliomania.com/
Bibliomania is a bare-bones, full-text site. It offers basic texts via a standard search engine. It does not provide a full citation of the print resource, which would help the user to properly cite a quote. I would not go here to do serious academic research. The strength of this database lies in its simplicity, low visuals, fast loading, and simple searching.

The Classics Archive
http://classics.mit.edu/index.html
The Classics Archive allows one to “Select from a list of 441 works of classical literature by 59 different authors, including user-driven commentary and ‘reader’s choice’ Web sites. Mainly Greco-Roman works (some Chinese and Persian), all in English translation.” This is a useful and fun page. It has a solid search interface, as well as challenging trivia questions. Links to Web sites, including bookstores, make for a rich interface.

Electronic Text Center
http://etext.lib.virginia.edu/uvaonline.html
The University of Virginia has been building the Electronic Text Center since 1992. The intended purpose is to “build and maintain an Internet-accessible collection of SGML texts and images” and “to build and maintain a user community adept at the creation and use of these materials.” This page combines an archive of tens of thousands of SGML- and XML-encoded electronic texts and images. What truly impresses me is that the library service offers hardware and software suitable for the creation and analysis of text. The Electronic Text Center trains scholars to move text to electronic format, where it can then be indexed, collated, turned into a concordance, word list, etc. Virginia is building the database and has also established a model for training others to develop their own databases. A true library paradigm.

EServer
http://english-server.hss.cmu.edu/
With 29,136 items available, the EServer at Carnegie Mellon has been a force in full-text collections for 10 years. Beyond books, this site features drama, audio and visuals, journals, and much more. If you want to find full text and a community to support the subject via listservs, Web sites, and contacts, this is an excellent starting place.

Humanities Text Initiative
http://www.hti.umich.edu/
Where else could I compare the Koran, the Book of Mormon, and various versions of the Bible? The Humanities Text Initiative, a unit of the University of Michigan’s Digital Library Production Service, developed this site in conjunction with the Library of Congress, Sun Computers, and other institutions. The content and the computer have combined to a greater good.

Internet Medieval Sourcebook
http://www.fordham.edu/halsall/sbook.html
Saints Lives’, Medieval Legal History, and Early Church Documents are available along with many other resources on this Fordham University Center for Medieval Studies site. Access is easy and the look and feel places one in a different time. The addition of historic maps allows historians to place their readings onto a visual reality. Very fun, very useful site.

Literary Resources on the Net
http://andromeda.rutgers.edu/~jlynch/Lit/
A meta-site that will take you to most every full-text resource that exists with a basic, simple format extremely rich in content. This is the place to go to for broad subject coverage and material gleaned from authors of many ethnicities and nationalities.

The Online Medieval and Classical Library (OMACL)
http://sunsite.berkeley.edu/OMACL/
Another strong resource from the Berkeley Digital Library SunSITE. Run by Douglas B. Killings, the Online Medieval and Classical Library offers a rich selection of important works from many traditions. These texts are searchable by title, author, genre, and language. An excellent place to find Icelandic sagas such as, The Story of Burnt Njal (“Njal’s Saga”). Oh, they have lots of Chaucer, too.

Project Gutenberg
http://www.promo.net/pg/
One of the original full-text (free) databases, Project Gutenberg began in 1971. In this project, volunteers decide what others may access. Any title out of copyright can be placed on a site; it simply needs to be chosen and typed into the proper format. Many languages and authors are represented in this super-rich Web site. This site represents a pre-AOL, pre-”dot.com” vision of the Internet.

Women Writers Project
http://www.wwp.brown.edu
As this site says, “The Brown University Women Writers Project is a long-term research project devoted to early modern women’s writing and electronic text encoding. Our goal is to bring texts by pre-Victorian women writers out of the archive and make them accessible to a wide audience of teachers, students, scholars, and the general reader.” This site supports authors not found in many mainstream sites and is one of my all-time favorites.

Contents

Searcher Home