THE SYSTEMS LIBRARIAN
The Advance of Computing From the Ground to the Cloud
Director for Innovative Technologies and Research, Vanderbilt University Libraries
A trend toward the abstraction of computing platforms that has been developing in the broader IT arena over the last few years is just beginning to make inroads into the library technology scene. Cloud computing offers for libraries many interesting possibilities that may help reduce technology costs and increase capacity, reliability, and performance for some types of automation activities. Cloud computing has made strong inroads into other commercial sectors and is now beginning to find more traction in the library technology sector.
The path to cloud computing pushes hardware to more abstract levels. Most of us are used to computing power being delivered from systems that we can see and touch. I can feel the keys on this laptop; I just returned from our server room that houses the racks of computers that power many of the library’s applications. Across campus, a large data center accommodates hundreds, if not thousands, of computing devices that comprise the university’s technical infrastructure.
Imagine instead a model of computing where all the heavy lifting of computing no longer takes place on these tangible on-site systems but takes place in some nebulous data center. While we’ll continue to need physical devices to interact with computer-based services, the infrastructure that provides these services evolves toward more concentrated, large-scale facilities. Concentration of computing resources into centralized data centers has largely run to completion in most large organizations such as universities, local government entities, and corporations. The next stage of evolution concentrates computing across organizations, delivering software applications through models such as software as a service and cloud computing.
A Continuum of Computing Abstraction
In order to better understand cloud computing, it might help to step through the different options in the way that an organization can manage its computer applications. We’ll work our way from the most tangible to the most abstract. We’ll use a library as an example, but the options apply to any kind of organization.
Locally managed. The most basic, and possibly the most inefficient, configuration involves housing a server in the local facility. If your library manages its own ILS and houses it within one of its own buildings, it falls into this category. This model involves purchasing the server hardware and managing the operating system, application, network connections, and security. It gives the most control at the highest effort and the highest cost.
Co-location. Rather than house its servers in the library, many libraries place them in an external data center. That data center may be associated with the library’s own organization. In our library at Vanderbilt, for example, we have co-location arrangements with the university’s data center to house many of the library’s servers. Pubic libraries might have similar arrangements with their municipal or county government data center; libraries of all kinds contract with external commercial hosting companies.
In most co-location arrangements, the library owns the server hardware. It essentially rents space and some level of service from the data center. If the data center is part of the library’s parent organization, these services may be absorbed without direct payment. Expect to pay startup and monthly fees for commercial co-location arrangements. The provider of the co-location services takes responsibility for the physical housing of the equipment and the cost of power and cooling. The library may or may not have physical access to the equipment, requiring all management to be performed remotely. Some co-location arrangements also include systems administration of the operating system but probably not the applications that run on the servers.
Virtualization. Today’s advanced computer hardware far outpaces the needs of many software applications. A dedicated server running a typical application load may operate at less than 10% of processing capacity and memory. Virtualization, a technique that has gained extremely wide acceptance, involves allowing multiple instances of operating systems to share a single physical server. These may be multiple instances of the same operating system or of different ones. This approach allows end users to simultaneously run multiple desktop systems such as Microsoft Windows, Linux, and Mac OS X. In the data center, virtualization allows each physical server to operate near its capacity, reducing the number of devices needed overall as well as the devices’ physical footprint, energy consumption, and technical management. Since each instance of a virtual server functions independently as if it were on dedicated hardware, each can serve different clients and their complement of applications. Unfortunately, not all applications run well in a virtualized environment. The technical programming of some applications may monopolize resources in ways that disrupt virtual environments. Organizations need to test their critical applications in a virtual machine prior to production deployment.
Virtualization can be implemented in locally managed, co-located, or remote-hosting scenarios. It requires careful administration to ensure a reasonable balance of virtual machines per physical devices and to monitor the resource use of each instance.
Dedicated hosting services. Similar to co-location, a library can opt for hosting services through its data center or a commercial provider. Most hosting services involve leasing equipment from the provider. This saves the library the costs of acquisition in exchange for monthly or annual subscription costs. When starting up a hosting arrangement, the library will detail the specifications of the server required, including processor type, amount of memory, disk storage, and the desired operating system. The provider will then allocate a server that meets these specifications and turn it over to the library to install the software.
This kind of hosting arrangement finds the most use for websites and web-based applications. It could also be used for hosting an ILS or other library software. The key point to keep in mind is that it’s just the server that’s being hosted. The operation of the software used on the hardware remains the library’s responsibility.
Shared hosting service. Another very common arrangement involves hosting services that do not necessarily involve leasing a dedicated server but a virtual server. The costs of a virtual machine can be much lower than a dedicated physical server, but it requires some verification that the application will function in a virtualized environment. Applications that work near peak server utilization would obviously not be good candidates for virtual hosting.
Web hosting. A very common arrangement involves simply hosting a website. Simple web-hosting arrangements allow the provider to aggregate a number of customer sites onto server hardware. Web hosting allows an organization to avoid server management and internet connectivity issues and to focus on the content of the site. Technically complex websites that involve scripting with PHP or Perl, content management systems, and other plug-ins may require other arrangements beyond simple web-hosting services.
Software as a service. Software as a service, or SaaS, has emerged as a major model for the deployment of business and consumer software. Many library automation vendors favor this approach and market it aggressively. This model delivers access to a software application independently of hardware considerations. The SaaS provider may take advantage of virtualization, server clustering, and other efficiencies in order to deliver an instance of its software in the most efficient way, yet it can deliver the software in a way that functions as if the library operated it locally. In a SaaS arrangement, the user can configure the software as needed but can not customize it at the level of changing functionality. SaaS usually involves the provider taking responsibility for the implementation of all software updates and the myriad other behind-the-scenes technical details. While SaaS works especially well with entirely web-based applications, it also supports applications that involve desktop clients.
The business model associated with SaaS generally involves a startup fee and then a standard monthly or annual subscription fee. This differs substantially from the traditional software deployment model that involves local purchase, installation, and maintenance of local hardware, upfront software licensing payments, and ongoing software and support fees. SaaS involves higher subscription fees but with offsetting savings in other areas.
Cloud computing. From the perspective of an organization using business applications, SaaS takes hardware concerns out of the picture and offers software abstractly. Cloud computing takes the abstraction even further by making hardware abstract for the developer of a software application.
Cloud computing involves a diffuse, distributed computing infrastructure provided through the internet in an abstract way, where neither the programmers who create the service nor the end users of that service need to be involved with the specific supporting hardware components. This model offers both data storage and computer processing capabilities. Programmers that develop cloud-based applications make calls to services that involve computational processing or data storage based on an applications programming interface that addresses resources provided by a cloud provider.
Cloud computing has become especially popular in the ecommerce arena. An organization can create a fully customized business application based on services delivered in a cloud.
The business model for cloud computing pegs costs of consumption at a very granular level. The creator of an application would pay fees to its cloud provider based on the accumulated number of transactions executed by the users of the system, cumulative data stored, and bandwidth consumed. Cloud computing offers organizations that create software the same cost trade-offs as SaaS does for end users. It trades off ongoing subscription fees for investments in hardware components. One can think of cloud computing as infrastructure as a service.
Cloud computing, an approach gaining wider adoption, comes in many flavors. In the most abstract form, a public cloud extends its services to a diverse range of application developers. While the resulting products remain functionally distinct, intermingling of data across organizations occurs within the cloud infrastructure. A private cloud follows the same technical approach, though it’s segregated to avoid sharing infrastructure between organizations. An organization might create its own cloud infrastructure to support a diverse set of internal business applications rather than rely on an external cloud provider.
When you plug your phone charger into the wall, you expect electricity to flow. You don’t need to know the details about how it was produced. Was it from a coal-burning plant, a nuclear facility, or a hydroelectric dam? You might care abstractly that it comes from some renewable source such as wind or solar power. But as far as charging your phone, all that matters practically is drawing electricity at the proper voltage. You expect the number of amps of power that you use to raise your monthly bill by a penny or two. Cloud computing strives toward this public utility model, including consumption-based pricing of commodity services.
One of the best-known providers of cloud services is Amazon.com with its Elastic Compute Cloud (EC2) service for computational transactions and Simple Storage Service (S3) for cloud-based data storage. Google offers its App Engine as a platform for cloud-based services. Dozens of other companies offer public, private, or hybrid cloud services, each with different characteristics and benefits. Once an organization is ready to deploy applications in the cloud, it will be able to select from a multitude of providers.
A Greener Approach
In libraries we don’t tend to think much about the energy costs associated with our servers. We tend to have few servers so that energy costs do not amount to a very large portion of the library’s overall expenses. But for those libraries that operate large data centers, power is a huge concern. It takes a lot of energy to run the equipment and almost as much to keep it cool. Avoiding those power costs is no small consideration and can represent a benefit for pushing as much of the computing infrastructure to the cloud as possible. SaaS and cloud computing can reduce energy consumption for a library. The impact can be more than shifting consumption from one organization to another. To the extent that it delivers computing services more efficiently, these models can reduce energy consumption overall.
Library Computing in the Cloud
While the various models of platform hosting and software as a service have become firmly established in the library automation arena, we’re in early stages in the adoption of cloud computing. We can point out a couple of high-profile examples.
OCLC probably ranks as the most prominent example of cloud computing in the library arena. The WorldCat platform involves a globally distributed infrastructure that involves the largest scale library-specific implementation. Its recent plan to offer library automation functions such as circulation, licensing, and acquisitions to complement existing cataloging, resource sharing, and end-user search capabilities will be a major test of its ability to deliver core business services to libraries through a cloud-computing model. Given our continuum of abstraction, we might think of OCLC’s new WorldCat Local library system in terms of a private cloud implementation.
In the institutional repository arena, DuraCloud has recently been launched as an interesting example of cloud computing. This service will be hosted by the new DuraSpace organization created by the merging of the DSpace Foundation and the Fedora Commons, the two dominant open-source institutional repository projects. (See www.digitalpreservation.gov/partners/duracloud/duracloud.html for information.)
Higher-profile projects in which the use of cloud computing seems particularly apparent include the infrastructure surrounding OCLC WorldCat and the DuraCloud repository platform. As cloud infrastructure becomes more of a standard approach for building applications, we can expect many other projects, both by commercial organizations and in individual libraries, to follow this approach.
Especially in these times where libraries must work with constrained financial resources, it becomes more important to take advantage of the most efficient technology models available. While the most abstract approach of cloud computing may not fit all library technology scenarios, some level of hardware consolidation or abstraction will help gain efficiencies and performance while reducing costs. The days of each library operating its own local servers have largely passed. This approach rarely represents the best use of library space and personnel. As libraries develop the next phase of their technology strategies, it’s important to think beyond the locally maintained computer infrastructure that increasingly represents an outdated and inefficient model. Co-location, remote hosting, virtualization, SaaS, and cloud computing each offer opportunities for libraries to expend fewer resources on maintaining infrastructure and to focus more on activities with direct benefit on library services.