Report from the Field: Advances in Digital Libraries 1998

Volume 15, Number 6 • June 1998

• Report from the Field •
Advances in Digital Libraries 1998

Conference showcases the move from dream to reality
by Susan Feldman

The first national Digital Libraries Initiative is drawing to a close, and proposals for the second round have been submitted. What have we learned? If this year's Advances in Digital Libraries (ADL) conference, held April 22-24 in Santa Barbara, California, is any indication, the information world is in for a time of ferment and excitement. We have moved beyond neat tools and widgets. The national digital library projects have introduced new information systems and tools that will have a profound effect on how information professionals view themselves and their profession.

Libraries have always been repositories of existing information. They gather it, select and organize it, make it accessible, and preserve it. The vision of digital libraries that emerges from this conference is far beyond the scope of current libraries. It places the library squarely in the middle of the information-creation business. Users will be able to use new information tools to combine pieces of information and to create new ones.

Imagine taking maps and overlaying them one on another to create a 3-D virtual fly-through of a marshland. Walk inside a molecule. Sort through newscasts that are searched by an automatically created text database of the spoken text. Then choose a news segment to view on your screen, starting precisely at the spot that discusses your topic.

These are no longer fantasies. They are the results of 4 years of research, and they will change profoundly how libraries work and how librarians think of themselves. The goal of these projects was to produce both research and a usable collection and product. Based on the presentations I saw in Santa Barbara, they have succeeded.

In addition to creating information tools, however, these projects were instrumental in initiating debates on societal issues, such as intellectual property, physical vs. digital forms of materials, how people use information, what information packages should look like, and what tools should be developed to manipulate their contents. What will the impact be on existing institutions, and on society? How will scholars choose to communicate if they can either send out research results today or have them appear in a paper peer-reviewed journal 2 years from now? These are not easy choices—for libraries, for publishers, or for scholars.

The California Digital Library

Richard Lucier, founding director of the California Digital Library, spoke about the University of California's new Digital Library (CDL), and the questions it needs to settle. He pointed to the flux and confusion in the scholarly communication process—a theme that was repeated quite often during the conference—and the changes such as distance learning that are affecting higher education. How does a library serve people it never sees?

Other issues include the demand for better infrastructure, rather than for new buildings; professors' fears that their access to physical materials will dwindle; the increases in both cost and amount of information libraries must handle; how libraries can plan for change; and how they must budget for new technologies, as well as integrate them into the current process. Of particular concern is the chaos in the marketplace, with established publishers "throwing out trial balloons" for changes in price and subscription models. And finally, Lucier noted that academic institutions are inherently conservative, and not kind to innovators.

The California Digital Library will open its virtual doors by this fall. In the process of building this library, its creators have reached some interesting conclusions:

The current library practice of creating a comprehensive collection can't be sustained. (Comprehensive access to materials through agreements with publishers and consortia will have to substitute for comprehensive ownership.)
Changes in scholarly publishing have only begun, and the digital library will have a role in facilitating that change.
It was best to implement strategically while planning, rather than arrive at a comprehensive plan that was outdated, and the plan would unfold "organically" through the interaction of content, users, and technology.
The CDL will be a single library for all nine campuses, complementing the existing nine physical libraries on the University of California campuses.

The California Digital Library will have a collection of high-quality digital materials. It will offer integrated tools for information delivery, including tools to create, share, manipulate, store, and use knowledge, all sharing a consistent interface. It will include published literature and digital-only literature, including scientific data sets and special collections. One interesting sidelight: They must develop licensing agreements to their own collections, as well as negotiate agreements for access with publishers of digital information.

One of the threads running through this conference was the need for standards. For libraries conscious of the need for permanent archives, this is a knotty problem. How should materials be described so they can be retrieved? What happens if the hardware on which materials are viewed changes? Can we port documents to new platforms without loss of integrity? How do we develop robust, reliable tools that can be shared among organizations, and evaluate their effectiveness?

Projects Push the Edge of Possibilities

The current national digital library projects were reported at this meeting, and there were also a number of other digital library research projects demonstrated. It was a virtual feast. Here are some that I found particularly intriguing.

The University of Michigan has created a digital library for science in the schools, grades six through nine. The project was a vehicle for investigating the use of ontologies and intelligent agents, as well as for creating economic incentives for use. They now have an operating model that includes age-appropriate materials, as well as tools for working with the information. Visit it at http://mydl.soe.umich.edu.

Carnegie Mellon's Informedia Digital Video Library entranced me with its possibilities. As far as I can see, if intellectual property problems don't derail it, this is the "killer application" for digital libraries. I say this not only because the application has such strong appeal, but because the Carnegie Mellon group has the right idea about how to approach creating an information application. Informedia is a system that takes video presentations, such as newscasts, and creates a searchable video collection from them automatically.

They do this by combining several kinds of information technologies in order to extract all levels of information from the original. They use speech recognition and natural language processing to create a text database from the narrative parts of the video. They capture words that appear on the screen, but aren't spoken, such as names of speakers. Image recognition identifies people or scenery. The user interface has many ways to identify and ask for what you need. Search on a typed-in query, plus the face of your subject. Get back what looks like a set of thumbnail images that represent each video clip, with a visual relevance ranking in the form of a colored column—the higher the column, the greater the relevance. Examine a timeline of the clip. The query words are each identified by a different color wherever they appear on the timeline, so that you can find where they cluster, and start viewing from that point.

None of these is a perfect technology, as the developers are the first to point out. But, the combination of imperfect technologies, each yielding some searchable clues, results in surprisingly good retrieval. The tools that they have developed to go with it are fun to use, and make finding information quite easy. Try it at http://informedia.cs.cmu.edu.

While Stanford University's project, Infobus, may not have the glamour of Informedia, it may be the strong foundation that will make digital libraries work, and work together. Stanford's goal was to create a modular structure that would allow a digital library to plug in heterogeneous modules without entirely rewriting the system to accommodate them. Each module has a uniform "wrapper" that allows it to interact with the system as a whole. Thus, you could plug in modules, such as DIALOG, FOLIO, DigiCash, and an image database. The system takes a query from the user, decides which resource is the best place to find an answer, and sends the query, appropriately structured, to that module. Answers are translated back into the common interface and presented to the user. Imagine searching all online systems through a single interface.

The Alexandria Digital Library at the University of California at Santa Barbara was one of the first to create a usable product. Its geographic information system now has a new Java interface. It searches all kinds of "spatially referenced" information: maps, satellite images, and digital elevation models. It can find a map of a city of over 5,000 people in the Mississippi Valley, which has nearby Indian burial sites, that shows the road network. It can find associated 19th century photographs. The system is modular, so that as new technologies are developed, they can be added. Leave plenty of time to explore this at http://www.alexandria.ucsb.edu.

The University of California at Berkeley concentrated on developing tools for using very large collections of digital data. Its goal was to create a system and tools that would make the work cycle more efficient for the user. Research was required to understand how information finding and use fit within the work cycle of the user, so they analyzed how land use planners, environmentalists, or zoologists use information. With that starting point, they designed several interesting systems. CalFlora is a database of pictures and information about California's plants. Blobworld finds pictures based on areas of color and shape. Less glitzy, but extremely useful, they have created document recognizers for some types of documents, such as tables, and then used them to perform such feats as creating a database from a table, and attaching it to a map to create new documents. Multivalent documents construct layers on top of the original, so that all versions can exist simultaneously (http://elib.cs.berkeley.edu).

Rutgers University's datamining project, described by Nabil Adam, created a data warehouse and analysis tools for decision making. It consists of a state-of-the-art environmental monitoring system plus satellite images, maps, including geological survey maps, and visualization techniques. One of the most impressive tools combines several maps into one, and then creates a virtual fly-through of an area.

Workshop Looked at Trends and Implications

A workshop on the future of electronic publishing and its socioeconomic implications was held in conjunction with this conference. Speakers pointed out that social, political, and economic issues are frequently overlooked, but are the hardest to address in a time of technological change. This workshop focused on users. Carol Tenopir and Don King reported on their research analyzing trends and data in scientific/technical journal publishing and use. Their goal was to describe options, costs, and decision factors. At this time, they have more than 20 years of data. Some trends they described include the following:

Print and electronic journals will coexist.
Electronic delivery is replacing paper document delivery.
Site licenses are becoming necessary to allow unrestricted access within an institution.
Publishers will continue to incur large fixed costs. The intellectual costs remain, no matter what the distribution mechanism is.
Publishers have raised library subscription rates to make up for their loss of personal subscriptions, leading to an increase in the time that required readers devote to finding and reading a journal. Since journal prices have increased, librarians have been forced to cut the number of subscriptions. This in turn has led to greater losses for publishers.

While this presentation was a too-brief overview, it is worth examining in detail the analysis King and Tenopir have done to help libraries predict their break-even points for subscribing to a journal vs. getting copies on demand. They have written a number of articles on the subject with helpful guidelines for libraries contending with rising journal prices.

Questions on Usability

A series of sessions on usability shows how far this field has come beyond just "cool technology." Questions now being considered concern how to design one system to fit the needs of many kinds of users. Can you serve both children and astrophysicists with the same collections and tools? Can we design digital libraries that provide universal access to all kinds of information sources in any format, and that fill all kinds of information needs?

Gary Marchionini maintained that digital library design depends on the users, on the content and how they will use it, and on their tasks. Just as special libraries and public libraries differ, so too must digital libraries serving disparate populations. And, the problem is even harder with digital libraries, since in that case the user populations are unknown. He suggested creating alternative interfaces, depending on both user preferences, and such easy-to-detect elements as type of equipment and bandwidth available to the individual. In addition, systems must build in help, to teach the user as he or she learns the system.

In the last 5 years, we have seen digital libraries grow from dream to reality. In the process, the field has attracted fine minds trying to solve nicely complex problems of information storage, presentation, access, and use. This ferment makes for a lively debate. Who could ever have thought libraries were stodgy?

Advances in Digital Libraries was sponsored by the IEEE Computer Society, NASA/Goddard Flight Center, the Library of Congress, the National Library of Medicine, the Alexandria Digital Library, CESDIS, Hughes Aircraft, and IBM.

Susan Feldman is owner and president of Datasearch, an information consulting firm specializing in digital libraries and search engines. She can be reached at sef2@cornell.edu.

Table of Contents

Information Today Home Page