[ONLINE] feature

Open Source and Libraries: An Interview with Dan Chudnov

Bill Mickey

ONLINE, January 2001
Copyright © 2001 Information Today, Inc.

Subscribe

Open source methods of software development have been around at least since Richard Stallman's pioneering efforts with his GNU Project in 1984–a project that developed the free GNU operating system (http://www.gnu.org). Lately, open source technology has gained mainstream notoriety and headlines by way of Napster and its cousins, but savvy systems administrators have been incorporating free software into work and home systems for years. Not surprisingly, open source methods are increasingly being used in the library environment. Free software is just that, free, so it obviously appeals to limited budgets. But there's much more. There's a passionate collaborative effort that's part of the open source methodology, and a strong community of developers always forms around a new project. Their efforts often yield niche applications that are incredibly flexible. A major voice behind advocating free software adoption for libraries is oss4lib, a Web site and mailing list devoted to getting the word out about new free projects and software uniquely tailored to library-specific functions. Dan Chudnov, Systems Architect at the Cushing/Whitney Medical Library at Yale University's School of Medicine, is one of the contributors to the oss4lib project (http://oss4lib.org) and a strong proponent of free software for libraries. I met with Dan at Yale, and he spoke at length about open source and how library methodologies and philosophies run parallel with the motivations behind the free software movement.

BM: Describe what you do here.

DC: I'm a librarian. I sit at the reference desk; I help put our online resources online. But what I primarily do is system development–my title is Systems Architect. I do system design, development, and programming. When I'm lucky, that means finding the right combination of tools to solve problems we're either starting to have or are going to have down the road. I spend a lot of time writing software from scratch because sometimes the right tools don't exist. But I spend more time keeping an eye on software that's already out there. So that when we need something to do a new job I know what's best for it. I try a lot of new things out. I'm mostly here to make sure we have the right tools for the right place at the right time.

BM: It sounds like you have a mix of IT and reference responsibilities.

DC: Definitely. Certainly more of an IT position rather than reference position. But I'm a librarian.

BM: So the interest in open source software is sort of a natural function of your position.

DC: It is.

BM: How did you get into open source?

DC: I've always used free software. I've been actively doing software development since I started library school. I saw it could be a good thing with a little investment in time. When I arrived at Yale in 1997, there were some projects we needed to do. I started writing some applications in Visual Basic and writing some proprietary tools for the Netscape Web server. It reached a point [where] it became very obvious that what we were doing was something other people could use. We had seen others working on the same thing and thought they should have what we were working on. We started to think about selling it. Well, we're just a library here; we're not going to make a lot of money and we don't want to do professional support. Fortunately at the same time, late 1998, there was a lot of noise in the software press, sites like Slashdot, about open source as a methodology for software engineering. It struck us that it made sense to do it that way. So we took the code that I wrote, with the help of a number of people here, and we put it out there in early 1999. We announced that it was available and licensed it with a free software license. It was amazing! Less than a month passed, and we heard from Eric Schnell at Ohio State's Prior Health Sciences Library. He said, "Dan, this is great. We were just about to do the same thing. We couldn't figure out how to solve a couple of problems, but your architecture seems to solve those problems, so we're going to do it the way you did it."

Basically, one month passed and somebody took what we did here and made it infinitely better. They took out all the proprietary bits–the Visual Basic code–and they made it better. It was a remarkable thing. We realized here that this is what libraries do; this is how librarians work. We have software problems in our industry: there is not enough of it. And what there is, isn't as good as it could be when you compare it to other industries. What better way to attack that problem than by just doing things together like this. And it hit us that we needed to get the message out. So we started the oss4lib site and mailing list, put the word out, and tried to get as many people involved as we could. Since then, it's been the way we do things.

BM: How many subscribers do you have on the mailing list?

DC: A couple hundred. But there's something like 10,000 hits a month. The email list is bursty–sometimes a week will go by with nothing, but the next day [it] will have 20 messages. And the Web site is constantly updated with new software announcements. It's not heavily trafficked, but there's a small, loyal group of people.

BM: Are these software announcements unique to library environments?

DC: Yeah, generally. Some people are aware of oss4lib and actively post upgrade announcements. But a lot of the projects I learn about are things outside the library community. There's a fellow who wrote a bibliographic management tool called Pybliographer, which was very good. He wasn't a librarian, but he needed it, so he wrote it–and it's excellent. So when he announces a new version, I make sure it gets on oss4lib. I watch Freshmeat (http://freshmeat.net) religiously. Freshmeat, if you've never seen it, is one big Web log of all the new software announcements for the free software community. I check it every day because they announce dozens and dozens of them daily. I watch for things that are relevant and copy them to oss4lib.

BM: A programmer can't just simply alter open source code and start using it. You mentioned a licensing process?

The GNU folks realized these companies...are stealing people's freedom....
DC: All software is inherently protected by copyright. If you write something it's yours, you have a copyright on that by law, whether you explicitly state that or not. The brilliant thing about the people who started this as a movement–mostly Richard Stallman with the GNU project in the early 1980s–was recognizing that the commercial developers' interest in making money was dependent on their ability to keep you from doing anything with their software other than what they wanted you to do with it. The GNU folks realized these companies were gaining a great deal of success. They saw that as stealing people's freedom–the freedom to take something and do with it what you wished. So Stallman devised the GNU General Public License. It's a legal license that was crafted with the help of some sharp legal minds. It says essentially that this piece of software is mine, but I'm giving you express permissions to use, copy, modify, and distribute this as you see fit–provided that you keep this license intact with any distributions you make and that you retain my original copyright.

Open source says it's good to license your software in a way that gives people the freedom to use it how they want. You don't have to use that exact license; there are certain tenets to that license that are common to other licenses, like the Berkeley-style license. In fact, there's a whole matrix of common licenses, but the people who coined the term open source said if something has an open source compliant license, it must have these specific tenets.

It's a long answer, but it's important to understand that it's all based on free software, the idea that these things should be free. So the open source folks said if your software is free according to these basic principles, then anyone who gets it should have the right to use it, to look at the source code, to make their own revisions.

BM: How do you think libraries are uniquely positioned to adopt open source technology, and what are the parallels in philosophy?

...more than anything else you've got to have a Web server, you've got to have databases, and you've got to hae programming languages that go between them.
DC: Open source is not a technology, it's a methodology. That's how I think of it. Open source says you should make software free, but engineer it according to certain principles. The utmost principle is that you should put your software out there in a way that anyone who's interested in using it can get back to you very quickly about whether it does what they need it to do. If it does, you should make it easy for them to use it, and if it doesn't, you should also make it easy for them to tell you how to make it do what they want.

In libraries, we tend to know what people think of what we're doing. Sometimes we forget to worry about that. Sometimes we worry about that too much. You can see when someone comes to the reference desk and asks you a question. If we can't answer their question, it sticks with us. Or if we can answer it but they have to get there through some convoluted way, and they say, "Hey, isn't there a better way to do this?" Well, that's exactly the kind of feedback that works best with software development. Librarians are very keen on hearing these things, and the good ones, hopefully, take those messages to heart and act on them to change their services.

Any reference librarian will also tell you that if you want to know something, ask someone who knows. It's the fastest way to find out. So it's always better to have two people at a reference desk. Two people are faster and better than one. And any academic will tell you that peer review helps keep quality high. That's another part of open source. You want people to look at what you're doing. If you're not sure how to do something, you ask people and you get other opinions on it. And you ask as many people as you can, you don't isolate yourself. It's very much like what happens at a reference desk. You have three people: one person with a question and two people with parts of an answer, and all three people walk away with an answer. That's the way open source, when managed properly, works. You've got one person writing a little bit of code, a second noticing problems, and a third suggesting a fix. Everybody gets it at the end.

Additionally, some libraries have very limited resources. Our budgets don't grow dramatically; we can't raise venture capital. So when we have catalog records that we create from scratch, it's best that we share them because then we can get more records from other people that we don't have time to do ourselves.

And finally, a very important message that I try to get across is that a hundred years ago, there was a tremendous wave of public investment from wealthy people in this country in public libraries and other library facilities. It used to be that if you wanted a public library, you needed a building, you needed stacks, you needed shelves, you needed a card catalog. Well it's a hundred years later and you still need a building, but in many cases, more than anything else you've got to have a Web server, you've got to have databases, and you've got to have programming languages that go between them. What librarians can do to their betterment is to consider the growing pool of very high-quality, free software tools that are out there as fundamental building blocks of our libraries today. And the people who are writing these bits of software in the aggregate are giving the same wealth to our communities as the Carnegies and their peers did a hundred years ago. These people are creating amazing value that we can use to build our libraries from the ground up.

BM: So what are some of the more obvious applications for open source software in libraries?

DC: Well, there are the things that aren't specific to libraries but are nonetheless vital. The entire Internet infrastructure runs on free software tools–most of which were developed using open source techniques–sendmail, apache, bind. We couldn't run our networks without them.

Specifically, there's a tool called Prospero that has really caught on. It's the document-delivery-to-Web piece that adds on to the Ariel tool. Ariel is fax over FTP. Most libraries who get .tiff files via Ariel print them out. Prospero takes that .tiff file and puts it on the Web as a PDF file, and automatically notifies people via email that it's available. So if I'm a Yale faculty member and I'm in California for the summer and I don't have access to the library, I can request an article from Yale. They'll get it for me and scan it or get it from another library through ILL or document delivery and the ILL office can just put it on the Web. And I'm sitting in California and click, click, click, and there's the document. That's what it does. It's a very small tool but it's very good at what it does.

Any library that does document delivery can use that–from the richest corporate libraries to the poorest public libraries. Because more and more people don't necessarily have time to go to the library and pick something up, but they can access the Web. I believe there are hundreds of libraries using Prospero now, and it's only been out for a little over a year.

Another tool that is getting a lot of use and development both inside and out of the library community is course reserve and management software. There are dozens of tools that are free and there are probably dozens that are for sale. We have a proprietary one here; we also have a homegrown one. And there are librarians working on something called OSCR (pronounced "oscar"), which is the librarian's approach to make sure the syllabus is online and the documents are available.

BM: Where is OSCR being developed?

DC: George Mason University. There's an email list and there are librarians in about a half-dozen libraries working on it. It's running in at least four or five libraries as of last year.

There are also attempts to build library management systems. Some people are working from a searchable Web-interface approach; others are working on a similar project starting from a circulation point. There's another person who's working on the cataloging part of it. These people know each other, they found each other on mailing lists, and they're talking to each other about the prospects of joining together when they get further along.

I hear rumblings of universities interested in jumping in on the same kinds of projects. Nothing official yet, but more and more people are thinking about it.

I really believe we have to take these reference sources and realms of information that are highly interlinked–in a way that librarians understand very well already–and make sure there are free, flexible, accurate versions of these things [so] that people can build software with extra functionality.

BM: A lot of folks might be turned off by this because it seems so programming-centric. But what if they have some insightful ideas on how and where innovation can be made in the library using open source? Who do they talk to, how do they get involved if they don't know how to program?

DC: There's lots to do. You don't have to be a programmer to contribute. If you see a project that's out there, whether it's a library-specific project or not, download it. Try it. Get it running. See if you can get it to work. Join the mailing list. If there's something you want it to do, tell them, ask them to add it in. Become an active member in the communities that develop around successful, free software projects–particularly the ones that use the open source methodology. When you join those communities, you see that there is a whole realm of tasks to do. For example, there are Web sites that need to be kept up, or documentation that needs to be written. There's a huge need for documentation. And librarians are good at providing that kind of stuff, and it doesn't always exist with the products we buy already–training materials, tutorials. There's a lot of need for that.

The programming part of it is something that fortunately more people are doing because it's getting easier. Librarians are seeing more of a need for it. This gets back to the first part of your question. People might be turned off because it requires them to engage more actively. If you buy software off the shelf and plug it in and it works, great. But anybody who's done that probably has had an experience where something didn't work and they had to make a phone call. Sometimes those phone calls don't get answered. These open source communities are just like that. They're like the companies that want to sell you software. But in most cases, because these are all volunteers who do it because they want to see the thing work, they're much more responsive.

So if you are bothered by the idea that you need to engage yourself more actively, you should remind yourself that you're pretty engaged already–with your staff that you depend on, for example. You hate that darn Netscape bug that makes you restart Windows or whatever and you're not going to escape this kind of problem even if you buy stuff off the shelves. So you might as well do it in a way where your contributions are more likely to get heard and acted on.

BM: And what's the buy-in for administration?

...in general, you tend to get better software faster.
DC: In terms of administrators saying to their staff, "There's software we're developing here, we should do it following this open source methodology", the single best buy-in for that is something we experienced here. We developed a piece of software here, and Ohio State got their hands on it and made it better. Now we're running Ohio State's version. We don't work on that anymore. The folks at Ohio State gave us back something that's infinitely better than what we originally did. Ask my boss if she thinks that was worth my initial time.

Another point is that, in general, you tend to get better software faster. Because so much of the software we all need in our libraries today is similar to our colleagues' in other libraries, if the software is designed from the beginning to be generically useful and someone on the other end implements their own details and settings, you'll often end up with a better design. It takes more work, but what you end up with is something that's much more flexible than you ever imagined.

BM: Anything else you'd like to add?

DC: Sure. This stuff is fun. Once you start doing it, you'll ask yourself why you haven't been doing it all along. Of course there are good reasons to buy software. But it's a lot of fun doing things the open source way. You can see your contributions take hold and benefit other people. It's just another thing that makes working in a library more fun.

Another reason why this is starting to catch on in libraries is that a lot of librarians are social and we like working together. And this gives a very clear, new way of doing that. It means that the one librarian who does a little bit of programming and all those other librarians out there who are working on similar projects who have never met before can land on the same message board and do something together and benefit from it, and it gives each of those individuals a whole new group of people they've got a lot in common with. You're going to get a lot of things done better, faster, and meet a lot of neat people along the way.


Open Source Resources

Oss4lib.org–A listing of free software engineered for libraries as well as news of new projects and related information. Sign up for the mailing list here.

Slashdot.com–"News for Nerds." Discussion boards and news for IT-related issues; this site often covers open source developments.

Freshmeat.net–Maintains the "largest index of Linux software on the Web." Also offers discussion forums and news for the open source community.

Advogato.org–An advocate site for free software developers, Advogato.org serves as a community resource for developers. Includes news and postings.

The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary–Written by Eric Raymond, published by O'Reilly and Associates. A collection of essays that explores the open source method of software development. A great introduction to open source.

The Unix Philosophy–Written by Mike Gancarz, published by Digital Equipment Corp. Describes the Unix philosophy behind software development in a nontechnical manner.


Bill Mickey (billm@infotoday.com) is Editor of EContent magazine.

Comments? Email letters to the Editor at marydee@infotoday.com.

[infotoday.com] [ONLINE] [Current Issue] [Subscriptions] [Top]

Copyright © 2001, Information Today, Inc. All rights reserved.
custserv@infotoday.com