Implementing Federated Search at the University of Wyoming
Michael L. Nelson, Mary Ann Harlow, and Cassandra Kvenild
The allure of federated searching is potent in academic libraries. Students and faculty want to streamline their searching across web-based search engines, library collections, and the bibliographic databases to which the library subscribes. Librarians want to see maximum usage of relevant resources. Federated search potentially answers these wishes.
The University of Wyoming Libraries began the process of implementing Central Search, a federated search tool marketed by Serials Solutions, a ProQuest company, in the fall of 2006. Since renamed to 360 Search, the search tool complements the company’s suite of serials and e-resource management tools. We formed a task force of three reference librarians to take on the project from beginning to completion, working with other library faculty and staff members as appropriate. We rolled out the final product in January 2007. Thus far, 360 Search has required only minor refinements. Although we have not done a systematic user study, we have gained some impressions of the ways patrons are using the tool.
The University of Wyoming (UW), a land-grant institution, has an enrollment of approximately 10,000 on the main Laramie campus and more than 2,000 additional students primarily located off campus. Instructional delivery methods for the latter group range from on-site classes in various locations around the state to audio-only classes to compressed video to online courses. As the only 4-year university in the state, UW offers a wide range of disciplines and programs. Accordingly, the UW Libraries are expected to offer an equally wide variety of general and specialized resources to support the curriculum.
Selecting the federated search system itself was straightforward. Since the library had already chosen Serials Solutions for our full-text link resolver and e-resources management functions, we decided to purchase 360 Search at the same time. We had previously run trials of two versions of another federated tool, but we opted not to acquire it.
Selecting databases to federate
The first step was to select the databases (and potentially other resource types) we should include in 360 Search. Serials Solutions recommended we begin with resource selection and address other decision points later since customized “connections” must be built for every database in order for it to function as part of federated searching.
Serials Solutions had already created an Excel spreadsheet of more than 2,300 resource titles, including not only databases but full-text resources, including collections such as JSTOR, individual newspapers, and statistical aggregations such as FEDSTATS. The primary sort was by provider. Resources vended by multiple providers were listed separately, so we could specify the provider as well as the resource—a necessary step since different implementations of a given resource often have diverse field structures, thus requiring different mapping algorithms. The bottom of the spreadsheet contained space to add any additional resources the library wished to federate. These resources would take longer to show up on the resource list because they had to go into a development queue for connection building.
The top of the spreadsheet solicited such required data as institutional IP range, proxy server used, and the library’s OPAC information. In addition to the cells containing provider names and resource titles, a selection box was to be marked for each resource desired. Additional cells for resource description, such as URL (only needed for resources not already on the list); username/password if required for access; and subject headings (discussed in a following section), were optional. The resource descriptions, which were input by the task force members, were displayed beside each entry in the database (resource) list within the public interface and, hence, need to be concise—typically 1–2 sentences.
STARTING WITH GENERAL DATABASES
With nearly 200 available databases at the University of Wyoming, it was a challenge to decide which ones to federate. Since federated searching is not primarily intended for in-depth, specialized research but rather for the typical undergraduate who needs to find a few useful sources, we began with general databases—Academic Search Premier, MasterFILE Premier, and Wilson OmniFile. We also opted to include our library catalog and Prospector, a union catalog of about 2 dozen Colorado libraries plus UW Libraries that supports user-initiated interlibrary borrowing. Our target users, looking for information in multiple formats, could discover books, documents, and other types of monographs as well as articles and databases.
To ensure that 360 Search would support as wide a variety of user needs as possible, while keeping in mind our primary target audience, we wanted to include specialized resources in as many core disciplines and subject areas as feasible. UW Libraries employ a decentralized collection development model with subject bibliographers who are responsible for assigned departments/programs. We requested all bibliographers to suggest resources in their areas.
Although most of the bibliographers submitted relatively short lists, we could only accommodate those without limits on simultaneous users. Serials Solutions strongly advises against federating resources with low numbers of seats, since they will generate “source failed to respond” error messages. The list of databases with limited user seats, generated by our library’s e-resource technician, had a side benefit of allowing all subject bibliographers to review it and make suggestions for increasing user limits or eliminating them altogether, pending an expected budget increase. Although our biennial budget was indeed increased in spring 2008, we are still considering those changes as part of an overall project to develop priorities for the added funding.
One other factor came into play: Several individuals suggested publisher- or aggregator-based full-text packages—for example, JSTOR, Project Muse, and ScienceDirect. After some discussion, we determined that, at least initially, we should stick with “traditional” abstracting/indexing databases for article access. In essence, subject-based rather than publisher packages are preferred even when the latter are searchable. We felt this would make more sense for our intended audience of undergraduates and general users. Since we were also implementing Serials Solutions’ link resolver, access to available full text did not seem to be an issue; any articles from full-text packages retrieved by a search within abstracting-and-indexing databases would be linked from there.
INTO THE WYLD
Another small but unanticipated hurdle we encountered when we were selecting databases was the inclusion of consortial databases. The UW Libraries is a member of the statewide WYLD consortium, and we wanted to include some of the WYLD-subscribed databases in our federated search. Doing so took some digging on our part to determine the details needed to establish connections. Once established, we didn’t experience any difficulties with the consortial resources in 360 Search.
Our initial list of resources we sent to Serials Solutions totaled 46 entries. Over the next few months, this list shrank by about 10 as technical and other glitches were identified. In one or two instances, we discovered that a resource had recently been dropped by either the library or the vendor. Most of the other issues involved difficulties encountered in building connections. For instance, an online philosophy encyclopedia we had selected in an effort to cover a discipline for which we offered few other e-resources returned numerous results for virtually every search we tried, regardless of the subject. In these instances, no valid matches within the retrieved records could be located, indicating a major flaw in the translation protocols. This is to be expected given the complexities of diverse ways of handling metadata and mapping fields.
We reported such issues to the Serials Solutions staff, who then pulled the problematic resources from its suite for further investigation. In other cases, such as two locally developed UW Libraries databases with Wyoming-related content, requested items were placed in the development queue with a caution that there would be some delay in adding them to our list. Two years later, they still have not been added. This is understandable given their clearly unique nature, but libraries considering a federated tool should not expect locally developed resources to receive a high-development priority.
Finally, note that delays can occur due to slow responses from some providers to inquiries from the federated tool vendor. In our experience, at least two providers would not work directly with Serials Solutions in gathering needed technical specifications for given resources. Instead, the providers required that library staff members communicate with them to receive the necessary data and then pass the data on to Serials Solutions.
Customizing the user interface
A hallmark of commercial federated search systems is customizability. Everything from branding to search and display functions to the overall interface design (the so-called look and feel) can be customized to various degrees by each library. Such flexibility, while it allows libraries to shape the product to best suit their own circumstances, poses challenges. It is not always obvious which settings make the most sense for a given installation. On the other hand, by encouraging library staff members to think through how users would be best served by the software, the development process can result in enhanced user services.
Customization work also points up the critical importance of having knowledgeable and helpful staff members from the system vendor available to answer questions and offer advice. Technical issues that are not obvious to library staff members who are responsible for customizing the product but that may strongly influence the choice of options need to be explained by the vendor’s client support staff.
As a key step in our implementation, Serials Solutions sent us a 20-page customization form. It covered the general interface appearance, search interfaces, and displaying/manipulating results. The required decision points were generally well-explained by accompanying instructions. They indicated recommended default settings along with other available settings. If the explanatory material did not completely answer our questions, help was always available from Serials Solutions.
Serials Solutions can create a webpage design for you, either emulating your existing library homepage or creating something new. Since UW was already using Serials Solutions’ ejournal management function, we opted for a custom interface using a nearly identical page design but featuring our branding for 360 Search. Our library systems staff members, particularly the web technician, were a great help in completing this step, communicating directly with Serials Solutions as needed.
FIND IT FAST
After considerable deliberation, we opted to brand the tool as “Find it fast.” It’s brief and to the point—while still giving the user some sense of its function. All of our primary webpages contain a “Library A–Z” button for quick alphabetical browsing of selected library webpages; they were obvious places to add a link to Find it fast.
Our key portal page for locating and selecting article indexes and other database types such as electronic reference sources—titled Article databases—was a logical place from which to link to Find it fast. Although we did add links to Find it fast, reaching the tool required a click from the homepage. Burying it one level below the homepage did not constitute maximum visibility. Our able web designer nicely solved this problem by adding a search box immediately above the “Find information” channel (which serves as our primary access point to searching library content). The Find it fast label was placed inside this box, with parenthetical text just below the title that read Search Library Databases. Clicking in this input box erases the brand name, leaving a blank box into which a search can be typed. In response, the system runs the search against all databases in the suite.
Customizing search interfaces and functions
The primary users of Find it fast would be undergraduates, and this fact informed our choices for customizing both the search interfaces and the results displays. Our duty was to make the service as clear and intuitive as possible. Serials Solutions’ recommended default for the search page is Basic Search, featuring the single box very familiar to general search engine users. Those wishing to use Advanced Search simply need to click a tab with that label.
The Advanced input screen features four input boxes with the option to add more. Available Boolean operators are set to AND as the default, with OR and NOT options. Field searching can be selected in the boxes to the right. All the boxes have title as the default; the other options are author, full text, keyword, subject, abstract, ISBN, ISSN, and “any” as the option for searching all fields. A year search limiter is also turned on by default.
Given that Basic Search, in order to preserve simplicity, does not include field selection boxes, a default search field had to be designated. Our initial inclination was to specify keyword, which of course is frequently the default field of choice given its comprehensive coverage of subject-rich fields such as subject headings, title, and abstract. Serials Solutions recommended title as the default option, which puzzled us. We knew “subject” would be too limiting since not all target resources contain subject-indexing metadata. But we thought “keyword” would include subject indexing where available, plus title and abstract. Serials Solutions, however, pointed out that “keyword” is defined differently in different databases. In some databases, it searches full text. Given the large number of irrelevant results such searches would return, we opted for the title default. We had to remind ourselves that given our target audience, high recall was not as important as precision.
The other major choice was whether to arrange resources on the default search page by database or by subject. Whichever was chosen, the alternate display was one click away. Given the relatively small number of databases we would be federating, we chose to list all of them alphabetically instead of using the default setting that grouped them by provider.
USER-FRIENDLY ENTRY POINT
Since a typical freshman would not find a list of several dozen databases (most of them probably unfamiliar) to be a user-friendly entry point, we opted for a subject arrangement as the default. This was a prime example of 360 Search customizability, as it was entirely up to us as to what subject arrangement and terminology to use. Several existing implementations in other libraries had quite elaborate subject approaches with multiple levels—broad subject categories broken down into second-level or even third-level subcategories. Although these generally included the ability to search entire categories and/or selected narrower tiers, we found them to be a bit too complicated for novice library users.
Considerable discussion and reflection yielded a scheme which seems to have worked quite well: 11 subject categories, six of which correspond to UW’s colleges (agriculture, business, education, engineering and physical sciences, health sciences, and law). Since our largest college, arts and sciences, encompasses so many disciplines, we broke those down to correspond with the college’s major divisions: biological and life sciences, humanities and fine arts, mathematics and statistics, and social and behavioral sciences. Although the physical sciences category is part of arts and sciences, logically it seemed to fit better with engineering. Finally, we added a “general” category for multidisciplinary and general resources.
To implement the public interface, all we had to do after creating the subject list was to go back to the database selection spreadsheet and enter one or more subject categories under which we wanted each resource to appear. All subject entries have check boxes for users to select the subject(s) they wish to include. If they want to search all 11, they simply check a “select all” box, which then checks all the boxes; we placed that box at the top of the list. In addition, each subject is hyperlinked, leading to a list of databases included in that category, each with a check box for inclusion in the search.
Customizing results displays
After the user conducts a search, the “dynamic resources” display screen appears as the results are being returned, showing the status of each resource included in the search. The default sort order of the final results lists is by date in descending order. The user can choose to sort by author, title, or source (providers and databases). For this last option, we wanted to display the results divided into books and articles, but we learned that is not possible. The results are always returned in order by database provider when the results are sorted by source.
The results page citations display our “Find It @ UW” article link button. This option takes the student to our article link resolver page and—along with linking to full text when available—offers ways to locate the item through our Request It service, which combines interlibrary loan, desktop delivery, and branch retrieval services. For the article link resolver button to work correctly within 360 Search, complete metadata must be sent by the database provider for the citation. If a piece is missing, the link to the resolver will not display. In that case, the user must click on the article title, which is hyperlinked, to open a new window displaying the corresponding full record in the target database. While not difficult to do, it is a bit of an annoyance.
We discussed how to word the results header sentence, as the options are somewhat confusing; regardless of how many records are retrieved from any given database, only the first 50 are initially displayed. We chose to display the number of results returned as well as the total hits. For example “Results 1–25 of 152 returned for ‘title contains polar and bears’ (1,093 total with 4 duplicates),” We hope that students will realize that only a subset of total results was returned and that they can modify their search or continue through the list of all results. We also chose to hide the database summary, reasoning that more sophisticated students could see the list by hitting the “display summary” button, and the amount of information on the screen would be manageable for less sophisticated students. The tradeoff is that users do not see which databases may be the best for their subject.
We were delighted to turn on the Vivisimo results clustering tool, a notable enhancement of the 360 Search package. Because undergraduate students often do broad searches, the clustered results enable students to narrow the results quickly. We chose to show just the clustered results on the left of the screen, as we thought it was the most intuitive option. Students interested in the result sources are likely to find the source button. We particularly like the “show in clusters” feature, which displays the cluster containing the particular article so students can see which clusters are worth exploring for similar items. In addition to the default “topic” cluster display, the user can also cluster results sets by date, author, or journal titles.
Students using 360 Search can directly print, email, or import records to bibliographic software such as EndNote or RefWorks.
A particularly impressive design feature of the 360 Search interface, as related to both search and results customization, is that for most option choices, the user can change from the default option selected by the library to any other available option using only one click. This applies to, among other options, basic/advanced search, subject/database list, and show/hide database summaries. The flexibility and resultant simplicity is a real advantage.
Postimplementation experience: technical issues
In the 2 years since we rolled out Find it fast, we have not made any fundamental changes to the customization setup. We did discover, not surprisingly, that occasional monitoring was warranted. Within days after adding Find it fast to our site, a task force member discovered that five new resources had spontaneously been added to our federated list. We understood that resources would be added only upon our request. Further, a new subject category, “other databases,” had been added and the new resources were dumped into it.
Serials Solutions staff members explained that some libraries had asked that all new resources be displayed in the 360 Search database list as they are added to the electronic resources management module; this way, all their databases would be listed in one place. Apparently, the default setting was to have those added resources display in the federated list, so those libraries that do not want new resources to automatically show up must check new resources as they are added to the management module and uncheck the “display in 360 Search” box. Once our e-resources technician was alerted to this issue, he was able to take care of deselecting such items.
Other than this, we haven’t experienced any serious technical problems other than frequently slow response time—which is often an issue with federated searching and, hence, not easily remedied. One problem cropped up with the Prospector union catalog, which switched to OR logic when zero results were retrieved from a search of the database; this sometimes generated huge numbers of irrelevant matches. This problem has now been corrected, but it points up that a federated search system can fall victim to the “least common denominator” syndrome in which the system can be impacted by one problematic target resource.
Postimplementation experience: public service issues
Prior to rolling out 360 Search, we mounted a publicity campaign alerting the campus community to this innovation. We sent emails to campus faculty and staff and created a brochure, which we posted in various library and other campus locations. In terms of bibliographic instruction, we began featuring Find it fast in our instruction sessions for multiple sections of the basic freshman English composition course beginning with the fall 2007 semester. Previously, we had focused on the library catalog and a general article database, Academic Search Premier. The reference librarians occasionally field questions at the desk from students using the tool, but such questions have not been very numerous, as far as we can tell.
We have not done a systematic survey or any other type of data-gathering project to measure usage. Although we have asked the library’s e-resources staff to look into the possibility of getting usage data for the 360 Search module, current staffing shortages have precluded any progress in determining the feasibility of the idea. If we are able to do so, that should give us helpful data. We do know that students access library resources through the university’s general online portal as well as through course management systems. We are currently investigating the possibility of including Find it fast boxes within the portal and our CMS shells, but we have not yet implemented this option.
To date, we really have only anecdotal impressions of the overall user experience with this tool. The occasional patron has been confused by the inclusion of local or union catalog records mostly containing monographic publications when they were expecting full-text articles only, even though such integration of resources from disparate “silos” is the foundation of federated searching. A staff member assisted a student who needed sources for a term paper in an agriculture course. The student had selected mostly pamphlets and similar kinds of hard-to-access sources from the AGRICOLA database. Since he needed only a few sources on a relatively general topic, we recommended getting out of federated search mode and using a general database instead.
In the future, we may add more databases to 360 Search, reflecting any changes made in those resources with simultaneous user limits as well as newly acquired or existing resources. At this point, however, we have made only one significant change to our suite. In summer 2007, the reference department held a planning retreat. Based on information gathered at a conference, several colleagues indicated that some academic libraries with federated searching were including Google, Google Scholar, or both. At first glance, one might think that incorporating a general search engine would dilute the advantages of providing users with edited, published content—advantages that many of us try to highlight in explaining the reasons using library databases generally yields higher quality results than searching the open web alone.
The argument for adding Google was that if the link to the federated search tool mentions Google, more students would be apt to try it. In addition to finding websites through Google, a process they may already be familiar with, they would also see results from library-provided databases and, thereby, gain exposure to these databases as well. Moreover, since we had already added our link resolver to Google Scholar, patrons would be linking back to Google content (some of it fee-based if the user does not go the OpenURL route) through link resolving, giving the library added visibility even within Google results.
As a result of the discussion, we added both Google Scholar and Google, reasoning that each search engine would contribute unique content and that any overlap would be addressed by 360 Search’s deduping feature. To telegraph the inclusion of Google in Find it fast, we added “& Google” to the existing “Search Library Databases” text under the search box on the homepage. Unfortunately, Google’s September 2008 decision to prohibit inclusion of Google results in federated searches put an end to that experiment. We still include Google Scholar in the database list but only as a “link-to” resource.
Recommendations and conclusions
• Having a good vendor-customer relationship is key to the success of a federated search project. We unfailingly experienced timely, accurate, and helpful information and advice from Serials Solutions whenever we had a question or other issue. Productive relationships with consortia in which a library participates (and to whom we occasionally had to ask exactly which products we had through them, which sometimes turned out to be multiple variations of a given database or package) and with both e-resources and Systems/IT staff within the library or parent institution also help the entire process go much more smoothly.
• Our team agreed that learning the particulars of our own suite of databases was a great benefit of the project. Reviewing exactly which resources we do and do not have and the specifics of those resources, from what content they cover to the user limits, was invaluable and informed our work on the public service side as well.
• Thinking about and then keeping in mind who the target audience for the system is should guide the entire process. All decisions benefit from filtering through this lens. For example, in most cases, advanced users and scholars doing literature searches in their own or closely related disciplines will not be best served by the current generation of federated tools. Specialized, focused databases are often required, so content covered by federated tools is usually too broad even when using our subject categories. Also, valuable metadata, such as limiters, controlled vocabularies and thesauri, citation indexing, etc., are not available because of the complicated translation protocols that federated searching requires.
• Assigning implementation to a small group with a deadline seemed to work well for us. Although we are all in reference, an implementation team would not necessarily have to be made up of exclusively reference or other public service people. But the team definitely should include these professionals, as they are familiar with user needs, the ways users approach and use databases, etc.
• When deciding on default settings, don’t agonize—they are important, but alternative options should be only a click away. If they are not, you may consider alternative products, because today’s users will not spend much time navigating sites—they either find what they need quickly or move on to something else.
• We need to encourage vendors to provide straightforward, intuitive usage data so that we can measure usage of the tool as a whole and of individual resources within it. Such data would make it much easier to make adjustments that would best serve the user community.
Was implementing federated search at the University of Wyoming Libraries a complete panacea for all search dilemmas? No. But it has allowed us to provide access to multiple resources in a way that our student population appreciates.
[Serials Solutions introduced Summon unified discovery service at ALA Midwinter in January 2009. It claims Summon “goes beyond federated search.” —Ed.]