Are Authors and Publishers Getting Scroogled?
Viewpoint: A Copyright Analysis of the Google Print Library Project
by Keith Kupferschmid
On Oct. 19, five publishers sued Google claiming that the Google Print Library Project violated their exclusive rights provided by U.S. copyright law. The suit—along with the suit filed by The Authors Guild on Sept. 20—is the culmination of months of debate, pitting publishers and authors against Google. The point of contention is whether Google violates copyright law by digitizing millions of books without the permission of the books’ authors and publishers and putting them on its servers to allow them to be searched online.
Over the years, other digitization projects have met with success. For example, the recently announced Open Content Alliance (OCA) is a global collaborative effort of cultural, technology, nonprofit, and governmental organizations that are working to build a permanent archive of multilingual digitized text and multimedia content. Content in the OCA archive will be accessible soon through major Web sites such as Yahoo! and through other search engines.
The OCA will encourage the greatest possible access to and reuse of collections in the archive, while respecting the content owners and contributors. Similarly, Microsoft recently struck a deal with The British Library (BL) to scan 100,000 books from the BL’s collection and make them available sometime next year. Unlike Google, however, Microsoft plans to scan copyrighted books only if it first receives permission from the book publishers. Other projects include Project Gutenberg (http://www.gutenberg.org), the U.S. Library of Congress Digital Preservation Program (http://www.digitalpreservation.gov), and Carnegie Mellon’s Million Book Project (http://www.library.cmu.edu). These efforts all have one thing in common: In each case, the aggregators responsible for digitizing, selecting, organizing, and compiling the content took steps to reach agreement with the copyright owners. Without such agreement, these projects would not have succeeded.
Background on the Google Print Library Project
Google manages two projects intended to make the text of books searchable online. One of the projects is referred to as the Google Print Publisher Program, which is a collaborative effort that enables Google to digitize and make books available for search when Google has received permission from the books’ publisher or author. This program—because it operates with the consent of copyright owners whose books are copied by Google—is noncontroversial.
The other project, referred to as the Google Print Library Project, is the focus of lawsuits initiated by The Authors Guild and several publishers. In the Google Print Library Project, which was not disclosed to authors and publishers until earlier this year, Google is working with the libraries of the University of Michigan, Harvard University, Stanford University, and Oxford University as well as the New York City Public Library to digitally scan the books in their collections and make the text of the books searchable online. All of this is done without the copyright owners’ permission and is in stark contrast to the approach taken by Google in its Print Publisher Program.
Google has not disclosed much information about the internal operations of the Print Library Project. It appears that Google employees will digitally scan the collection of books and then index them using keywords so that they can be searched. When users search for these words, they will be provided with search results that show the title of the book, the number of times the keyword appears in the book, and as many as three “snippets” displaying text from the book that includes those keywords. It is not clear how much text the snippets will display.
But many other questions about the program remain unanswered. For instance, it is not clear how many copies of the books Google will be making and retaining for itself and whether its long-term plans involve uses of the books in addition to those that have been publicly disclosed so far. Google may be making and retaining as many as three or more copies of the book for itself (the scanned copy, the digital copy, and a backup copy). Of course, the libraries will also have a copy. Based on our experiences with other information aggregators, Google is likely to make additional copies while maintaining and operating its database.
And how does Google plan to protect against people abusing its search tool, which could destroy the value of the books? For instance, an individual or a computer program could bombard the Print Library search tool with enough keyword requests to download the heart of a book or substantial portions of it. This so-called gaming of the system occurs frequently with publicly accessible online information. One case was brought against the Internet Archive in which an organization made more than 700 attempts to access Web pages on the Internet Archive Web site. Ninety-two of the attempts were successful at obtaining content.
Similarly, it is not clear what security precautions Google is taking to ensure that its Print Library Project search tool is not hacked in a way that allows the digitized books to be freely downloadable. This past summer, Google shut down its video search tool after it was hacked into and entire movies, such as The Matrix, were downloaded.
Once the project became public, numerous groups and publishers cried foul. The first to protest publicly was the Association of American University Presses (AAUP), which issued a public letter to Google containing a list of 16 questions. The questions posed included how long a “snippet” of text will the search engine return in the results, how many digital copies will Google make and store, and how does Google plan to use the copies in the future. AAUP’s letter was followed by a similar letter from the Association of American Publishers (AAP) and a position statement by the Association of Learned and Professional Society Publishers (ALPSP), both demanding that Google terminate the project.
In early August, Google announced that it would be suspending the Google Print Library Project until Nov. 1 due to this criticism. Google requested that publishers provide lists of copyrighted books they do not want included in the Print Library Project. Not surprisingly, the book publishers were pleased about the moratorium, but they weren’t happy about Google’s attempt to shift the burden of identifying what titles Google would not be allowed to copy onto the shoulders of the authors and publishers.
In September, The Authors Guild initiated a class action suit against Google. In the following month, five publishers filed their own suit against Google, charging Google with large-scale, systematic copyright infringement. Then in November, Google’s moratorium on scanning books ended, and Google once again began scanning books. While the controversy over the legality of the Google Print Library Project is not an issue that is going to go away soon, it represents a significant challenge to the future of copyright in the online world.
Why the Google Print Library Project Violates Copyright Law
Under copyright law, the copyright owner of a book is granted the exclusive right to control whether others make copies of the book, distribute it, or display it. These rights extend equally to portions of the book. Basically, if a person other than the copyright owner wants to copy, distribute, or display a book or excerpts from it, permission must be granted from the copyright owner.
There are several exceptions to this general rule. The best known is the fair use exception. This exception permits a person who wants to copy, distribute, or display excerpts from a book to do so without first obtaining the copyright owner’s permission if that person can prove two things. First, the person must establish that the use is for purposes of criticism, comment, news reporting, teaching, scholarship or research. Second, the person must show that the use qualifies as a “fair use” after considering the following four factors: 1) the purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes; 2) the nature of the copyrighted work (in other words, whether the book is fiction or nonfiction); 3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4) the effect of the use upon the potential market for or value of the copyrighted work.
Defining Fair Use
Google defends its right to manage the Google Print Library Project by asserting that its activities are covered by the fair use exception. Before engaging in any fair use analysis, it should be noted that any such analysis in this case will be extremely difficult because of the sheer volume and variety of books and authors at issue. Fair use claims usually involve a work or a handful of works all owned by one or a few authors. This case, however, involves potentially millions of books owned by millions of different authors and publishers. There does not appear to ever have been any case involving fair use that has been applied on such a broad scale to so many works by so many authors and publishers being copied by one entity. That fact alone may be sufficient cause to deny Google’s fair use claim.
Google claims that, although it is copying entire books, such copies are allowed as a fair use because it is making only small excerpts of the books available online and the copies made are only intermediate copies. It cites Kelly v. Arriba Soft Corp., an anomalous case decided in 2003 by the Ninth Circuit, for the proposition that such intermediate copying is permissible under fair use. In the Kelly case, defendant Arriba Soft Corp. operated a visual search engine that retrieved thumbnail images of photos that were already posted on the Internet. By clicking on a thumbnail image, a user was presented with a page containing a full-size image that was imported directly from plaintiff Kelly’s Web site. The court concluded that the use of the images constituted a fair use because, among other things, Kelly’s use of the images was an artistic one, while Arriba’s use was as part of a tool that indexes images on the Web, which was unrelated to any artistic purpose.
As the debate ensues, the Software & Information Industry Association (SIIA) is actively monitoring the implementation of the Google Print Library Project and the responses of publishers and authors. SIIA represents a wide variety of authors, publishers, users, and, most significantly, aggregators of content that engage in digitization activities not unlike those undertaken by Google. Consequently, SIIA and its members are extremely interested in the copyright and technological issues relating to the Google Print Library Project.
SIIA is fully supportive of the goals of the Google Print Library Project—putting more content and information into the hands of the public. We commend Google for conceiving this project.
What we do not commend, however, is how Google has chosen to implement the project. Unlike the digitization programs noted in the “Scroogled” article, Google has chosen to implement its Print Library Project in the face of strong opposition by authors and publishers. It is also acting in direct contravention of well-established principles of copyright law. In short, Google’s blatant disregard for copyright owners and copyright law makes the Google Print Library Project a large-scale commercial infringement of copyright, the likes of which have not been seen since Napster.
The main article provides a brief history of the Google Print Library Project, followed by a discussion of why it runs afoul of the law. It is our hope that Google will re-evaluate its stance on the project so that it may continue operating in a manner that is consistent with copyright law and, therefore, assures the maximum participation of authors and publishers.
SIIA represents many large and small information aggregators. These companies make business, financial, health, educational, technological, and other informational materials available to all types of users. They frequently negotiate agreements with owners of all types of publications—including books that are likely to be contained in the collections of the libraries participating in Google Print Library Project—to digitize and/or make these publications available for searching. If Google and others are allowed to dispense negotiating with copyright owners to digitize their works, Google will destroy the market for these works.
SIIA has offered and continues to offer to work with Google to rework the project so that it operates within the confines of the law—not above it. —K.K.
“Yesterday’s Google Print announcement combines an unusually careful step by the company, opening access to 10,000 but selecting only items no longer under copyright. This is a surprising bit of playing by the rules for a company that routinely makes up the rules! It also follows by one day the not-so-careful resumption of scanning both in and out of copyright books from member libraries. With legal challenges pending, this is a bit of an olive leaf. Google had been scanning in and out of copyright books before their self-imposed moratorium earlier this year, so they could have opened access through Google Print to both types. Is this goodwill, or part of a pre-trial strategy to demonstrate good behavior in the eyes of the court to support the resumption of its original objective?
“We’ll know when the next tranche of 10,000 or so is released down the road now that In and Out scanning has resumed. One sure bet, this is NOT a signal that in the long run they’re willing to make detours and leave pockets of info behind in their move toward making the world’s information universally available. It is pure positioning.”
—Chuck Richard, vice president and lead analyst, Outsell, Inc.
Here is a list of factors for fair use.
1. Purpose and Character of the Use. The first fair use factor requires an analysis of “the purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes.” If the use is educational in nature, it is more likely to be a fair use; if it is more commercial in nature, it is likely not to be.
While Google will argue that its motives are wholly altruistic and educational, that simply is not the case. Google is a commercial for-profit enterprise. It initiated this project because it believes it will increase traffic to its Web site, which will eventually increase Google’s advertising revenue. Some 98 percent of Google’s revenue is generated through advertising. The nature of Google’s use here is commercial, and the first factor should fall in favor of the copyright owners.
In Kelly, the court concluded that the first fair use factor favored Arriba because Arriba’s use was neither to “directly promote its [Web site] nor … to profit by selling Kelly’s images. Instead Kelly’s images were among thousands of images in Arriba’s search engine database.”
Unlike in Kelly, Google’s use of the books is directly for the purpose of promoting its own site. If that were not the case, Google would have made the books entirely searchable on the libraries’ sites (potentially raising different issues) and would not have to retain copies of the digitized books themselves. Instead, Google is requiring that searches take place on its site, which ultimately results in more advertising revenue for Google. The other noticeable difference from the court’s holding in Kelly is that, in this instance, the court will not be considering use of just one copyright owner’s works “among thousands” of others, but considering all owners of all books digitized by Google.
While Google will likely claim that the purpose of its Print Library Project search tool is educational in nature (because it helps people locate books on topics of interest), that argument could be made in most copyright infringement cases. Certainly, Napster and Grokster (both lost recent well-publicized copyright infringement cases) could have argued that they were merely making content more available to the masses.
Google may also attempt to convince the courts that its copying is “transformative,” another consideration under the first fair use factor. The courts consider a transformative use to be a use that is for a different purpose or of a different character than the use of the copyrighted work, and the use does not supersede the need for the copyrighted work. Courts have uniformly held that merely transferring a work from one medium or format to another is not enough to qualify as a transformative use.
In Kelly, the use was found to be transformative because the thumbnail images made by Arriba Soft were for “improving access to images on the [I]nternet” and not for the artistic purposes of the original. Unlike in Kelly, however, Google is not improving access to material already on the Internet; it is creating access to material that is not on the Internet. By creating access and not simply improving it, Google has merely transferred the books from one medium (print) to another (online) and far exceeded what was considered a transformative use even in Kelly.
2. The Nature of the Work. The second factor—“the nature of the copyrighted work”—likely favors the authors and publishers. This factor looks at whether the books are factual or more creative in nature. The more creative and expressive the book, the less likely the book can be subject to fair use. Application of this factor is not entirely clear since Google will be copying both fiction and nonfiction books. However, the sheer volume of fictional books designated for copying likely leads to the result that this factor will favor the authors and publishers.
3. The Amount of the Works Used. The third factor—“the amount and substantiality of the portion used in relation to the copyrighted work as a whole”—favors the copyright owner more than any other factor. Google is copying entire books—lots of them—for the project. The court in Kelly acknowledged that “copying an entire work militates against a finding of fair use” but ultimately found that this factor did not favor either party because “if the secondary user only copies as much as is necessary for his or her intended use, then the factor will not weigh against him or her.” Unlike in Kelly, Google is making more copies than necessary. All that is necessary is to make and provide the library with a copy, but Google has kept a copy (and likely numerous copies) for itself.
Google also contends that since users will see only small snippets of the book text and not the complete text of the digitized books, these complete text copies are “intermediate copies,” which are allowed under fair use. This argument ignores several facts. First, in cases where the courts have allowed the making of so-called intermediate copies, the copies were deleted immediately after they were used. For example, in Kelly, after Arriba Soft created the thumbnail images from the full-resolution images found online, they immediately deleted any copies of the full-resolution images. Here, Google is retaining permanent copies of the digitized books, so, in fact, they are not intermediate copies at all.
The basic premise of copyright protection is that publishers and authors have the right to control the copying, distribution, and display of their books. The display of the snippet through the Google search engine implicates the reproduction, distribution, and display rights because the snippet is a reproduction of a small portion of the text that is being displayed on users’ screens and also is being distributed to them via such displays. Because Google copies the entire book as a precursor to displaying a snippet, Google’s copying of the book gives rise to an additional claim of infringement of the reproduction right. This claim applies regardless of whether a snippet is ever displayed. Google would have us focus on the display of the “snippet.” The display is important, but, fundamentally, it’s the copying of entire books without the explicit permission of authors or publishers that is the first step in the analysis (one which Google is sidestepping).
If, as Google insists, the court may consider only whether a snippet is infringing and not whether the full-text copy of the book is infringing—because the full-text copy from which the snippet is created is what Google terms a “non-infringing intermediate copy”—then the reproduction right will be effectively eviscerated. Under this reasoning, an infringement of the reproduction right could only be possible when there is a corresponding distribution or display. In effect, Google’s argument here represents a radical new interpretation of U.S. and international copyright law that undermines the basic premise of copyright law.
4. The Effect on the Market for the Work. The fourth factor, which is often the most influential on a court’s decision, asks whether the use will adversely affect the actual or potential market for the books. This factor looks not only at the user’s conduct but, more significantly, at the effect on the market if the use should become widespread. The analysis in Kelly is wholly inapplicable to this case, because in Kelly, there was no actual or potential market for the thumbnail images that competed with Kelly’s images. In Google’s case, however, there are both actual and potential markets for digitizing these books.
The market for licensing such works to aggregators is on the upswing. As Google has no doubt recognized, the marketplace for information is growing exponentially as users desire access to information faster and easier than ever before. Competition in this marketplace is significant, as aggregators hustle to reach agreement with content providers to put their works online.
Google is trying to become a leader in the information industry by changing the rules, rather than playing by them. While other aggregators generally take great care to first reach agreement with copyright owners to make their content searchable online and then compensate them accordingly, Google is doing neither. If Google succeeds on its fair use claim, it will no longer be necessary for aggregators and others that make nondigital content searchable online to get permission from the owners of that content or to compensate those owners. If upheld, Google’s claim will have succeeded in destroying the burgeoning market for information content.
Not only will Google’s actions destroy the existing and potential marketplace for information content, they will also succeed in destroying Google’s own market. As we know, Google has a counterpart project to its Google Print Library Project, called the Google Print Publisher Program (PPP). Under the PPP, Google reaches agreement with publishers to digitize books and make them searchable and accessible through Google’s search engine. If Google’s fair use claim is allowed, the PPP will become obsolete. Why would Google take the time, money, and resources to get permission from copyright owners to digitize their content and make it searchable and accessible online if they are allowed to do it legally without making that effort?
From a business standpoint, it would be impossible to justify continuation of the PPP under these conditions, destroying PPP and the value it provides to copyright owners in controlling how their works are made accessible through that program. Even if Google were to continue to operate PPP, it would have little value to book publishers and authors because any negotiating leverage that they would have with Google over how to make their books available would evaporate. If they cannot agree to terms, Google will simply make their books available through the Google Print Library Project.
Holding fair use in favor of Google would turn copyright on its head. It would allow not only Google but countless other less reputable entities to engage in wholesale copying of copyrighted works for the purposes of making those works—or portions of them—accessible online. In essence, the rights of writers and publishers would likely cease to exist in the online world.
Google’s other claimed legal justification for the PPP is that it has permission—more accurately, implied permission (i.e., an implied license)—to copy any content posted on the Internet for the purposes of allowing people to conduct searches using the Google search engine. Google claims this implied permission emanates from the fact that a Web site operator would not have posted content unless he or she wanted it to be found by users. Google will not copy content located behind a firewall or content located on a Web page that includes an exclusion header telling Google not to copy the Web site. Google is of the opinion that any implied authorization that might exist for Web sites extends to its Google Print Library Project.
If Google’s implied license theory holds up in court, the ramifications for publishers and authors, as well as others who create copyrighted works, could be devastating. Google and others could copy any copyrighted content, whether print or digital. Books might be first, but Google (or others) might eventually migrate to copying personal letters and e-mails, print newspapers and magazines, or photographs and video that the authors never intended to be copied or searchable online.
Thankfully, Google’s implied license argument seems certain to fail. Even if you assume that Google’s implied license theory is correct as applied to Web sites, it does not apply to the books being copied as part of the Google Print Library Project, because (unlike Web site content) the collection of books being copied are not at present generally available at no cost on the Internet. For Google to make these books available, it first has to scan and copy the books and save these digital copies on its servers.
There is no justification for an implied license by the copyright owners of these books that would allow Google to digitize them, save them to its servers, and then make them available for searching.
The Next Step
There appears to be no legal basis justifying Google’s massive copying of books to populate its Print Library Project. Nevertheless, digital searching of content—if done correctly—could be of great value to authors, publishers, libraries, users, and Google.
A ruling in favor of Google that allows it to continue to operate the Print Library Project would be a devastating blow to authors and publishers and creators of all kind and would undermine the purpose and goals of U.S. and international copyright law. As a result, no doubt the interested parties will be watching very closely as the cases filed by The Authors Guild and the publishers proceed toward rulings by the courts.
Keith Kupferschmid is vice president for intellectual property policy and enforcement for the Software & Information Industry Association (SIIA) and does not represent any of the parties involved in The Authors Guild litigation filed in September. The views expressed in this article do not necessarily represent the view of the parties involved in The Authors Guild’s or in the publisher’s litigations. Send your comments about this article to firstname.lastname@example.org. See the January issue for Google’s side of the issues.