An Open Source Interface to Your Library System
Mike Beccaria and Dan Scott
Fac-Back-OPAC is a faceted back up OPAC. This advanced catalog offers features that compare favorably with the traditional catalogs for today’s library systems. Fac-Back-OPAC represents the convergence of two prominent trends in library tools: the decoupling of discovery tools from the traditional integrated library system and the use of readily available open source components to rapidly produce leading-edge technology for meeting patron and library needs. Built on code that was originally developed by Casey Durfee in February 2007, Fac-Back-OPAC is available for no cost under an open source license to any library that wants to offer an advanced search interface or a backup catalog for its patrons.
Dan Scott Traces the History of Fac-Back-OPAC
I joined Laurentian University as a systems librarian in early 2006. It quickly became apparent to me that the proprietary library system catalog I had assumed responsibility for suffered from a stability problem. We needed a backup catalog that was independent of our proprietary library system. Unfortunately, in mid-2006, a good alternative catalog that met my requirements—free, easy to deploy and customize, and able to run on a basic desktop machine—was hard to find. The best apparent option at the time, Koha, wasn’t up to the task of handling our 750,000 bibliographic records (a limitation that will reportedly be lifted with the integration of the Zebra indexing engine in Koha 3.0). So my search for a backup catalog continued.
Fortunately, the open source search server Solr was the dominant theme of the code4lib 2007 conference. An all-day Solr preconference workshop led by Erik Hatcher kicked off the event, and a demonstration of Andrew Nagy’s MyResource portal proved Solr’s potential. Casey Durfee’s presentation Open Source Endeca in 250 Lines or Less gently mocked search technology that requires license costs in the six-digit price range by offering a fully open source alternative built on Solr. Durfee’s presentation held my attention because I was familiar with all of the build ing blocks of his solution, and it seemed to meet Laurentian’s needs for a backup catalog.
As the conference drew to a close, I asked Durfee about his intentions for the source code in his technology demo. He considered his work to be throw away code but gave me permission to release it under an open source license as Fac-Back-OPAC, the faceted backup OPAC. I invited others to join me in the project. Today, Fac-Back-OPAC is fulfilling Laurentian’s needs for a backup catalog, and it has been greatly enriched by the contributions of others.
Mike Beccaria Explains How He Got Involved
I started working at Paul Smith’s College in fall 2005 as a systems librarian. Paul Smith’s is a small school (about 850 students) with a small collection (about 40,000 items) and, consequently, a small budget. Our library runs a SirsiDynix ILS using its Webcat OPAC. And while SirsiDynix has several upgrades available for our OPAC, we couldn’t justify the expense of providing a new skin and features that didn’t address what we considered the core issues found in many traditional OPACs, namely the lack of customization, poor relevancy-ranking algorithms (if any), and the lack of findability and browsability of collections on par with user expectations.
The unveiling of North Carolina State University (NCSU) Libraries’ Endeca catalog in January 2006 set the standard for what I wanted for my library. Since then, I had been trying to find a way to offer something similar to our patrons. The solution couldn’t be difficult to create or implement because I’m not a professional programmer, and it had to be free because I didn’t have much money to spend. This was a tall order to fill at the time, which was why I was so excited to hear about the presentations being given at code4lib 2007.
At the conference, I too attended Hatcher’s Solr workshop and watched as Durfee, Nagy, and Hatcher showed examples of what was possible even for the nonprofessional programmer. Except for skimming a couple of books on Ruby on Rails and Lucene prior to the conference, I was not very familiar with the technologies presented there. The reality was that I had no idea whether a project like this was actually feasible for someone with my skill set. However, my concern was quickly alleviated. The conference was held in late February and within a week of getting back, I had a fully customizable OPAC with faceted navigation and advanced search features working on my desktop.
By mid-March, I had learned of Dan’s Fac-Back-OPAC Google Code project. I quickly signed up and integrated some enhancements I had made. After showing the rest of the library staff what was possible, I got approval to purchase a new server on which the library could host our new OPAC (among other things).
Fac-Back-OPAC, though developed recently, already boasts a rich feature set that’s absent in many proprietary ILS OPACs. It’s one of the canonical examples of the next generation of OPACs coming to the market, offering librarians and patrons advanced functionality that previously had only been available to institutions capable of footing a large bill and providing a team of programmers to implement it. Using a car as a metaphor, let’s take a look at some of the features that it offers, starting under the hood to examine some functionality that you may be looking for and ending with the patrons’ experience of taking the OPAC for a test-drive.
The Materials— Every piece of software is built on some sort of programming framework or architecture. Consider these the materials with which the OPAC was made. Under the hood, Fac-Back-OPAC was developed using enterprise-class open source software (Solr and Django). This combination delivers speed, reliability, and efficiency and is customizable by librarians and programmers who are willing to take the time to learn the technology from readily available documentation and examples. This capability simply isn’t possible in the current field of proprietary ILS products. It also means that once a user understands the less than 2,000 lines of code that glues these components together, he or she can add more features with relative ease.
Under the Hood— Fac-Back-OPAC’s engine is made up of two components: the indexer and the search engine. At its most basic level, the catalog is designed to gather data from MARC rec ords that are extracted from a library’s ILS and is thus completely independent from the ILS itself. The indexer is very powerful and is able to analyze and extract data at the most granular level of MARC fields and subfields using simple configuration files. More advanced users can create custom filters to tailor their data and index it in any way they like. For example, with the indexer, you can write a custom filter to change values in MARC records on-the-fly so the data can be displayed to the patron in a more user-friendly way.
The search engine complements the indexer. While we’ll discuss the technical details more in the next section, Fac-Back-OPAC uses Solr, a search engine platform that’s capable of advanced search functionality that rivals many proprietary systems. With the ability to customize fields and relevancy algorithms, Solr puts the power in the hands of the systems administrator and allows the librarians to determine which data is most important in order to specify which search results float to the top. Additionally, from the end-user perspective, Solr’s query syntax includes many advanced features that are inherited when searching a Fac-Back-OPAC catalog. Specifically, these include field boosting, simple field searching (keyword, title, author, etc.), advanced search and Boolean operators (single- and multiple-character wild cards, stemming, grouping, phrase search, AND, “+”, OR, NOT, “-”), and sorting (relevance, publication date, descending, and ascending).
The Ride— When test-driving a car, you want it to be comfortable and easy to drive and have some cool extras. Fac-Back-OPAC delivers this. It gets patrons to their search destinations with powerful resource discovery and sharing features. The catalog leverages descriptive metadata and controlled vocabulary by providing patrons with customizable faceted browsing. Additionally, because the catalog performs searches by reading information from the URL, all of the links in the catalog are shareable.
Fac-Back-OPAC provides RSS and Atom feeds for every search, enabling patrons to keep track of new materials that are found for arbitrary searches. Combined with the option of a customizable item-level display or linking to your existing catalog, as well as full internationalization (currently offering French and English interfaces), Fac-Back-OPAC demonstrates what’s possible with the next generation of OPAC software. The dream of integrating your catalog into your Web site without having to hack the ILS while offering next-generation search functionality is now a real possibility.
Technology Building Blocks
Most of Fac-Back-OPAC’s success can be attributed directly to the excellent open source components on which it’s built: Solr, marc4j, and Django. Let’s follow the life cycle of a MARC record as it goes through the process of becoming a searchable document in Fac-Back-OPAC.
From the ILS to MARC— Following the same approach used by NCSU and other institutions to build a catalog outside of the bounds of the traditional integrated library system, we begin by exporting all of the MARC rec ords and holdings on a nightly basis. After the initial export, you can use whatever tools your ILS offers to determine which records are new or changed and export only those changed records to update your Fac-Back-OPAC instance. (We use the word “instance” here because multiple copies, or instances, of Fac-Back-OPAC can run simultaneously on the same server.)
From MARC to MARCXML: marc4j and Jython— We begin our journey with traditional MARC rec ords. MARC, although a wonderful transmission and encoding format, is not well-suited to indexing by traditional search engines. So we use the excellent marc4j library to convert each MARC record to the more easily handled MARCXML format. As Solr only understands UTF-8 (an 8-bit Unicode character encoding format), we also convert any records encoded in MARC-8 to UTF-8 if required. In addition, we use the Jython programming language (a Python interpreter running in a Java Virtual Machine) to control marc4j and we use the simpler Python syntax to control the rest of the indexing process.
Indexing Records: Solr— The Solr search solution builds on the enterprise-quality Lucene search engine by offering Web services that simplify the indexing of documents and add extra features such as faceted search results. Each Solr instance can be configured with a schema that describes the fields of a document and the characteristics of those fields (such as text, date, or integer types). Fac-Back-OPAC uses an instance of Solr with a bibliographic schema that runs inside the Jetty application server. We then take each MARCXML record, extract the fields of interest for faceting and indexing purposes, and generate an XML document as a string to send to the Solr instance for indexing via an HTTP POST method.
Search Interface: Django— Django is a popular Web application framework that implements the model-view-controller (MVC) pattern written in Python. It also offers support for features, including caching search results for performance, translating into different languages, and setting up RSS feeds. Adrian Holovaty, Django’s original developer, uses it at The Washington Post. This utilization amply demonstrates the framework’s robustness and performance. To customize the three Django templates that constitute the initial search screen, the search results page, and the item detail page, you simply edit the HTML-based templates found in the catalog/catalog/templates/ subdirectory of Fac-Back-OPAC.
A Free Ride, or a Sports Car at Public Transit Prices?
Fac-Back-OPAC’s code is freely available under an open source license, but as with any technology, there are other costs that you must consider. These include obtaining the technical expertise to provide implementation and support, hardware costs, and some “must-have” functionality that some proprietary ILS vendors offer but that Fac-Back-OPAC doesn’t currently support.
First, you should ensure that you have personnel with experience in these technologies or someone who has the time and willingness to learn. Unfortunately, implementing Fac-Back-OPAC is not yet as simple as installing iTunes in Windows or filling out a form on a Web site. Installation may call for some one who is comfortable with performing the tasks done by a systems administrator. Fac-Back-OPAC requires the sysadmin to be familiar with the following components: Python, Django, Solr, Java JDK 1.5, and a Subversion client. The good news is that help is available. You can find installation details on the Google Code project wiki. Documentation for these components is online for free. Another great resource for us has been the code4lib community and online forums. The folks there are very responsive to anything you might need help with.
In addition to personnel expertise, Fac-Back-OPAC has computer hardware requirements. Surprisingly, for such a powerful tool, it isn’t a resource hog. So far, it has been tested on Windows and Linux, although theoretically it can be run on any operating system that’s capable of running a recent version of the core components. The requirements for a production site will vary largely depending on several factors, including the size of your collection, how much traffic your Web server will get, and what other applications your server will be running. For comparison purposes, Mike has a test install of 40,000 records running on his Windows XP workstation with 1 GB of RAM and a Pentium 4 2.8-Ghz processor. Dan has a test install of 400,000 records running on a Linux workstation with 1.5 GB of RAM and a Pentium 4 2.8-Ghz processor, but less than half of that RAM is allocated to Fac-Back-OPAC. The caching combination of Solr and Django consistently delivers subsecond results even for large collections hosted on midlevel hardware.
One of Fac-Back-OPAC’s strengths is that it’s completely independent of the ILS. While this allows for greater flexibility, it comes with a price. Because the catalog only imports item records, it has no link to user information. This means that any related functionality your ILS may have provided, such as holds, payments, account status, and in some cases, item status, will have to be written into the application separately. For many, including large consortia libraries that rely on such functionality, this may be a deal breaker.
The Future of Fac-Back-OPAC
With Fac-Back-OPAC, the sky is the limit. While currently in its infancy, it has a lot of room to grow. At Paul Smith’s College, the intention is to monitor user behavior this fall and continue to develop the catalog based on patron needs and search behavior. Additional features such as patron book lists, Dewey/LC subject browsing, and full-MARC display or Web 2.0 tools like tagging and patron reviews would be great enhancements. If you’re interested in working to make Fac-Back-OPAC better, sign up at the Google Code project site.
The Buzz About Endeca
In January 2006, North Carolina State University (NCSU) Libraries officially unveiled its
new Endeca catalog (www.lib.ncsu.edu/endeca). It offered the following departures from standard library practice:
- It replaced the catalog that was bundled as part of the ILS with one that was developed from the ground up at NCSU.
- It used search technology licensed from Endeca, a software developer and consulting firm with experience in the commercial sector but no previous experience in the library sector.
- It introduced facets to let users refine search results by narrowing search by specific subject headings, genres, authors, call number ranges, etc.
The Endeca-based catalog at NCSU reinvigorated debate about the features that library discovery tools should offer. It also legitimized Endeca as a purveyor of search technology for libraries and spurred open source enthusiasts to develop alternatives to the costly proprietary options.
code4lib Spells Relief
code4lib is an informal community of library technology specialists who cling together for moral and technical support. While most of its interactions are held over IRC, email, or blogs, the code4lib community has gathered at an eponymous conference the last two springs to exchange experiences and knowledge in person. If you’re reading this article, you should definitely check out http://planet.code4lib.org, http://irc.code4lib.org, and http://conf.code4lib.org.
[Editor’s Note: See a full report on the code4lib conference in Daniel Chudnov’s column in the May 2007 issue of CIL.]
More About the Software and Sites Mentioned in This Article
Fac-Back-OPAC at Paul Smith’s College
NCSU’s Endeca-Based OPAC
Casey Durfee on Open Source Catalogs
Helios and other open source OPACs represent an effort to bring the core values of librarianship to the library software world. To many patrons, our Web presence is the library. They spend far more time with our Web sites and search interfaces than they do with librarians. So it is imperative that the catalog be much better than most systems in place today if we want to save the readers’ time and to make our libraries easier to use.
Open source software also embodies the simple but revolutionary idea behind libraries: that information should be open and free for anyone to use. The power of open source makes it possible to create free systems that are as good as or better than the commercial products out there now—systems that can be easily modified and extended by anyone—and allows all libraries, not just the ones with enough money, to have the best possible software.
— C. D.
Mike Beccaria is the systems librarian for Paul Smith’s College in northern New York. He received a B.A. in history from SUNY–Geneseo and an M.L.S. from Southern Connecticut State University. His email address is firstname.lastname@example.org.
Dan Scott is the systems librarian for Laurentian University in Sudbury, Ontario, Canada. He created and maintains the File_MARC PHP library and is a regular contributor to the Evergreen Open-ILS project. He has also contributed patches and documentation to the LibX browser extension. Scott co-authored a book ( Apache Derby: Off to the Races) and occasionally writes about coffee, code, and other good things at http://coffeecode.net. He holds a B.A. in English and philosophy from Laurentian University and an M.I.S. from the University of Toronto. His email address is email@example.com.