Online KMWorld CRM Media, LLC Streaming Media Inc Faulkner Speech Technology
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Searcher > January 2009
Back Index Forward

Vol. 17 No. 1 — January 2009
U-Content: Project Gutenberg, Me, and You
by Nicholas Tomaiuolo
Instruction Librarian

Central Connecticut State University

Perhaps you’ve heard this one.

What’s the difference between the gang answering questions over at Wondir and the stranger sitting on the bar stool next to you?

Answer: The stranger may buy you a drink.


Here’s another.

What do the members of the Village People have in common with Wikipedia team member Essjay (faux college professor/actual high school graduate)?

Answer: They all like to pretend they are someone they aren’t.

In 1993 The New Yorker, applying its sharp wit and insight (as it did when it exposed Essjay1), ran the now famous cartoon caption, “On the Internet, nobody knows you’re a dog.” The veracity of this statement is patently evident, and, while the implications of the who-creates-what’s-out-there-on-the-net question rankle a great many people, information professionals rightfully find it more than an irksome reality but one bordering on menacing.

Individuals interested in the controversy over the validity and reliability of user-generated content (UGC) fall into two groups. Proponents have read the comparison of Britannica and Wikipedia that appeared in Nature2, which, as I’m sure you know, concluded that the user-generated content in Wikipedia is about as accurate as the gold standard resource Britannica. Incidentally, these people aren’t just computer geeks (though they do seem extraordinarily facile with their machines). They include the youthful and wizened, but all are committed to the concept that collective, shared experience creates an optimal information consortium. On the other side of this matter stand the Old Guard (where’s my smiley face shortcut key?), whose outrage over deceptions such as the one Essjay attempted to perpetrate through The New Yorker is only the tip of the titanium foundation buttressing their conviction that the proliferation of user-generated content signals an irrevocable departure from the authoritative.

It doesn’t matter in which camp you’ve pitched your tent, because you’ve undoubtedly realized by now that user-generated content isn’t going to go away. Although I once held fast to the belief that the bloggers, the self-published authors, the noncredentialed experts, and the dilettante videographers populating the pages of WordPress, Wikipedia, Yedda, and YouTube would, sooner or later, become bored and lose interest or get day jobs, it has become apparent that no amount of rhetoric nor evidence will persuade the masses that they shouldn’t use the aforementioned sites or their brethren. Despite the writings of Andrew Keen, Nicholas Carr, and Lee Siegel3 arguing that the seduction of self-publication leads to cultural and intellectual ruin, according to Alexa4, four of the 10 most popular sites on the web contain exclusively user-generated content (i.e., Facebook, YouTube, MySpace, and Wikipedia). TIME magazine writes that five of the 10 websites “We Can’t Live Without” are “all about you” (i.e., Flickr, Wikipedia, Facebook, Digg, Yelp, and craigslist)5.

The issue is no longer whether we can make a convincing case proving that most UGC is bad, but rather what can we contribute to make this content better. Once you’ve decided that, even in the absence of tangible payback (such as publication in a traditional printed magazine), there is a compelling rationale for populating the web with credible information, you’re ready to join a newly created third “camp.” Be forewarned, however, that although most vocations and avocations require our time, energy, attention, and dedication, becoming a contributing citizen of the UGC society demands activity and diligence beyond what many consider “normal.” Individuals in the thick of Web 2.0 are prepared to continually monitor their favorite sites, always have their RSS readers open, watch tag clouds in Delicious, and stay constantly at the ready to compose, edit, and post.

UGC: Where Do You Start?

Upon joining this new group, you must first determine where you’d be most valuable in the context of UGC environment. If you have some sterling presentations you’d like to publish on the web, investigate uploading them to Slideshare []. If you want to share your subject expertise, explore Google’s new project called Knol []. Actually, that option might even make you some money. Do you want to get in on a groundbreaking new health sciences site? Participate in the Personal Genome Project [].

For most librarians, an interest in books led to our decision to enter the field; we enjoy reading. When thinking about online books, the first site that comes to mind is probably Project Gutenberg (PG). Founded in 1971, it’s not just another resource involving books and built on user-generated content, it is the oldest resource on the internet engaged in accepting and distributing UGC. PG founder Michael Hart put his finger directly on the major UGC issue when he said, “Many people are concerned that the stuff they find on the ’net isn’t very good. So let’s make sure there’s good stuff by creating it ourselves.”

Project Gutenberg’s Version of the Steps Required for Contributing a Text

On its FAQ page, Project Gutenberg boils the process of adding an item down to four steps:

• Borrow or buy an eligible book.

• Send us a copy of the front and back of the title page, and wait for an OK.

• Turn the book into electronic text.

• Send it to us.

As my article shows, climbing these four steps may involve breaking a sweat.

Contributing my time, energy, and two books to PG was not my first excursion in UGC, but it is the first time I have allied myself with a high-profile international project. Adding content to PG requires patience, good social skills (for interacting with your proofreader), and the ability to intuit what needs to be done to get your contribution online. Here’s a journal of my recent experience. (See the sidebar Project Gutenberg’s Verions of the Steps on the right for the concise step-by-step directions for getting material into Project Gutenberg.)

Do You Have What It Takes?

Day 1

Background: The original Project Gutenberg [] is a repository of more than 28,000 works. Although PG occasionally accepts copyrighted works, it mainly deals with public domain materials. For books published in the U.S., any title copyrighted before 1923 is a candidate — but if you decide to become a contributor, choose a book you think you will enjoy. You’ll be living with it close up and personal for some time!

Step 1

First, you must ask yourself, “What public domain material do I have access to that is not already available in Project Gutenberg?” Forget Shakespeare, Chaucer, Disraeli, Galsworthy, and Swift. Ditto for Zola, Moliere, Voltaire, and Hugo. PG doesn’t need your Dostoyevsky, Chekov, or Gogol, nor does it lack Douglass, Hawthorne, or Conrad. Not surprisingly, I discovered it doesn’t want another copy of Don Quixote or an alternate version of the Divine Comedy.

You can, of course, enter your favorite deceased authors’ names into the basic search, but, if you think there is a good chance that PG has already got your favorite covered, consider accessing PG’s “Online Book Catalog” [] and browsing to see what the project may lack that your library owns. If you recall a favorite reading that no one else seems to have read, you can match your interests with the holdings. I had success using this strategy. My affinity to theatre made me recall an interesting title I’d encountered in my library’s Special Collections Department: The Treason and Death of Benedict Arnold: A Play for a Greek Theatre (1910) by John Jay Chapman. Having executed the requisite search, it turned out that it was not in PG’s virtual holdings. I’m a guy who doesn’t do well at slot machines and would never attempt to “count cards.” Quickly discovering that I had an item Gutenberg might want was as close to “winner, winner, chicken dinner!” that I will ever get.

Make your first submission to PG something manageable (i.e., something short). The brevity of my selected work helped maintain my enthusiasm.. Still, it’s always a good idea to have a back up plan. My “Plan B” work was a government document that I’d found at my library called Memorial Address on the Life and Character of Abraham Lincoln by George Bancroft. I put it aside in case I had time for an additional submission.

Time to identify that my title was not held at Project Gutenberg: 30 minutes.

Step 2

PG wants content, but it has policies. After you have preliminarily identified an item, you must proceed to “copyright clearance” []. Here you determine if the item is, indeed, in the public domain. The process took another half-hour. I needed to create a username and password and then begin the submission of a “new copyright clearance request.”

Be ready with the item in hand. You need to know basic bibliographic information in order to complete the clearance form. PG wants the author, title, and place and date of publication. Contributors, known as “producers” at PG, must also upload a scan of the title page and the verso. The preferred file types for scans are JPG, GIF, or PNG. (I discovered this only after I had saved my scans as PDFs, but the usually persnickety PG associates were willing to overlook this.)

Having sent the bibliographic information and the required scans, producers may check the status of their potential ebook submissions by visiting and logging in with their credentials. In the course of submitting The Treason and Death of Benedict Arnold, I was pleasantly surprised when I checked the link an hour later to find that my initial scans and information had cleared the first hurdle. A later submission of the required information for an older title called Some Passages on the Life and Death of John, Earl of Rochester (1680) took much longer for acceptance — in fact, 4 weeks. This may have been due to the fact that the title was printed in Great Britain and the U.S. copyright rule-of-thumb (1923) did not necessarily apply; the experience exemplifies, however, that it may take awhile to get rolling.

Call Me ‘Al’

Day 2

Having obtained copyright clearance, it was time to get the play into a format that PG could use and repost. Skipping over the fine print, I thought that a high-quality PDF of the item would be perfect, so using a generic scanning hardware, I scanned the 75-page play, which took all of 30 minutes, and saved it as a PDF.

Next, I returned to Project Gutenberg’s submission page []. The form requires basic bibliographic information; at the end of the page you can upload your actual submission. Uploading the PDF scan was simple, but this is where this particular project became much trickier.

Everyone who contributes a book believes they’ve done a good job, but the submissions need to be proofread. Once PG has your file, you can expect an email from a “Whitewasher” (PG’s designation for proofreaders, and a shout out to Mark Twain’s Tom Sawyer). Mine came from “Al,” a PG volunteer in Canada, who introduced himself as the proofreader who would be working on my submission. His missive’s tone was amiable but firm: PDFs are an acceptable supplementary format, but it is mandatory that all submissions be uploaded as plain text (.txt files). He referred to the site’s FAQ, which I had neglected to view. Plain text is the “lowest common denominator” format, readable by all devices. Of course, materials submitted in Japanese or Chinese, etc., are exceptions.

The Whitewasher and I exchanged quite a few emails while I attempted to persuade him that PDFs were fine. “Everyone with a computer has free Abode Reader software, right, Al?” During the exchange Al sent these bon mots: “There are now over 3 billion cell phones in service, as compared to the over 1 billion computers; it will probably pay dividends to format for them.” PG wasn’t going to bend the rules and, frankly, its reasoning was valid.

Lead Into Gold?

Day 3

It was time to investigate how to transform a PDF into a plain text file. Note that because Adobe Acrobat Professional cannot interpret graphic files, it can’t resave a scan in PDF format as plain text. I was halfway through the seven stages of grief when Al the Whitewasher mentioned some options. The first daunting option was to manually render the play with a text editing program, which, despite its excruciating agony, I attempted. Al, a true slave to duty, was, shall I say, somewhat underwhelmed by my production. A significant amount of reformatting needs to be done when adapting the hard copy of a published work to a plain text file. Here’s just one example: italics disappear and italicized words need to be bracketed with underscores (unfortunately, there are innumerable italicized words in a play’s script). In my zeal to submit the item, I had failed to do this as I typed it in.

Contacting the Information Technology department on my campus was the second option. Oh, dear! You don’t have access to an IT department? Do you have a really “with-it” son/daughter or niece/nephew who is attending college? It’s essentially the same thing. An associate at Central Connecticut State University’s Instructional Design and Technology Resource Center took the PDF of the play (which I created on Day 2) and, using Acrobat Professional, converted it to a TIFF (Tagged Image File Format) file and then imported the file to OCR (Optical Character Recognition) software. The entire process required about an hour. Because the TIFF file was also a graphic file, I wondered why the OCR software managed to handle that format but not the PDF. The answer is that the TIFF has an entirely different file structure, one readable by OCR programs.

The IT department further converted the TIFF file into a Notepad (.txt) version of the play ready for editing.

Day 4

Feeling rather confident that all I needed to do now was to correct poorly converted characters, add a few underscores to set off italicized words, and look for instances of alphabetic characters dropped during scanning, I worked through the play. Proofreading and editing your item requires concentration. You need to have both the plain text version and the scanned version on your computer’s desktop, because you have to compare the two line by line. Getting the text version of the play into a shape I thought Al would accept took several more hours. I emailed it back to him the next day.

Day 5

All in all, choosing a play was probably not a wise way to start my life as a PG producer. Although Al claimed no experience or acumen in dramaturgy, he was dissatisfied with the formatting of my submission. It became apparent that the play was going to take a great deal of editing. (To prove the point, seeFigure 1 at right for a look at the proofreading interface used by Al at PG.)

When people’s communication is limited to email exchanges, which require a good deal of time to compose, instructions and comments are not always as clear as we might desire. Getting on the “same page” as my Gutenberg proofreader was proving difficult. I rarely leave projects unfinished, but I perceived this particular effort would be unsuccessful.

An Alternative for Your PDFs

In 2004, Information Today, Inc. published a book I wrote entitled The Web Library. In the course of writing that book, I interviewed Michael Hart, founder and head of Project Gutenberg. I emailed Michael and we conversed about my predicament. This exchange also gave me a chance to catch up on developments at Project Gutenberg. (As you can see in the interview that I did via email with him, which begins on page 32, Mr. Hart is not only informative but speaks with a great deal of verve.)

Since I now had in hand this nice PDF of an interesting play, I asked, “Was there anyway PG would simply use my play in its PDF format?” Michael replied, “Project Gutenberg does have a library that contains items that only have a PDF version, and you can send me the file for reposting to that site.” He was talking about the Project Gutenberg Consortia Center []. The Consortia Center is a portal that manages electronic books: it brings together 2 dozen ebook collections from beyond PG and, in so doing, permits a number of different formats including PDF submissions. I quickly forwarded The Treason and Death of Benedict Arnold to the Consortia Center.

Only 1 day after sending my beloved play to the Consortia Center, I received an email confirming my submission was available for readers to view at

Day 6

My mission to contribute to PG was not yet realized. I wanted to “get it right.” I wanted to send a .txt file and have it posted on the original PG site. Remember my Plan B item called Memorial Address on the Life and Character of Abraham Lincoln? The item that my library owned was an original — the printing and publication date was 1866. I had already obtained copyright clearance for it, but there was no way I could subject this item to a scanner.

Somewhere in the back of my mind, I always knew that eventually I would find a use for Google Book Search []! An advanced title search limited to “Full View Only” retrieved several digitized copies of the Memorial Address. I downloaded a PDF, which originated at the New York Public Library’s collection, and then downloaded the accompanying plain text file. This procedure saved time and labor. It’s a maneuver I recommend that you use; many books at PG bear “from the Google Print project” in their production notes. [Google Print was the original name for Google Book Search. —Ed.]

Actually, I felt some qualms at using a Google Book Search digitized copy, even though PG clearly was willing to accept such texts. So I checked with one of PG’s legal advisers. He responded that the public domain status of the book gave me freedom to use it as I pleased. However, attributing the book to Google might be seen as using their trademark. So it was up to me. Project Gutenberg, as a noncommercial user of the product, falls in a category of user that Google’s own service commends. PG producer’s choice, I guess.

It was still necessary to go through the plain text file and compare it with the PDF. Although the converted file I used was relatively error-free, some text required correction. For example, the names of several of the senators listed in the appendix of the Memorial Address ran together; an innocent word such as “one” can appear as “oue” after its conversion from PDF to text. I knew Al would frown on these.

Total time to prepare the file and email it to the Whitewasher = 7 hours.

Day 7

I was happy when I read Al’s email, which began, “This looks fairly good.” He had attached an error report generated by an excellent software program called “Gutcheck.” Gutcheck is a plain-text proofreading computer program specifically tuned to report the problems that spellcheckers don’t — errors such as mismatched quotes, misplaced punctuation, and unintended blank lines. It noted numerous “unbalanced quotation marks” and a few “duplicate punctuation marks” within my submission. Gutcheck refers to errors by line number, making correction easier.

Easy or not, corrections take time, but I finished them within an hour and sent the revised file to my Whitewasher. (Perhaps you’re feeling like a ping-pong ball at this point considering all the back and forth. I know I certainly did. But as Blake wrote: “If the fool would persist in his folly, he would become wise.”) It wasn’t long before Al wrote back waving the lantern that gave the light at the end of the tunnel, and when you get this kind of email from a perfectionist Whitewasher at PG like Al, it’s seems more like staring into the Bat Signal!


If you’re satisfied that the book has been *thoroughly* proof-read (and I do mean *thoroughly*—there’s no skimping on this step), and the assorted Gutcheck items dealt with satisfactorily, I’d say to create a new copy of the file, and remove all those page splits from it. Watch out for places where the page following a split starts with a new paragraph, and leave a blank line between that paragraph and the one preceding it. At the Appendix, leave four blank lines before “Appendix” and two after it, to set it off as a section title.

Give the file a new version number, and send it back to me, and I’ll re-run Gutcheck on it.

I looked the file over, and sent it right back.

Day 8

I checked my email with great anticipation and found a missive from Al asking, “Do you know how to handle zip files?” When I “Rogered” him, he sent me a zipped folder with three files: the JPG image file of a portrait of Abe Lincoln that was part of the original PDF, a text file with the Memorial Address that we had worked on together, and a big bonus — a beautifully formatted HTML file of the item. The image of Lincoln and the hyptertext file were all Al’s doing. He provided explicit instructions on how to upload the content to Project Gutenberg and how to complete the PG form to indicate my responsibility for the item. I was off to (which I had last visited on Day 2) to follow through with the submission.

Two days later, Al, whom I frequently suspected was out to thwart me at every turn, befriended my efforts completely. He was a tough taskmaster and a terse correspondent. In his final email he simply provided a link to the item [] and thanked me.

After all my kvetching about Al and my two successful submissions (one to the original PG and one to the PG Consortia), Michael Hart wrote: “Just to give you an idea of how much Al does, he just posted 26 eBooks, mostly in French, from one of our volunteers in France, who does not understand our systems of doing things and thus needs a certain, shall we say, level of hand holding. It was hard work, just like yours, and ended up being greatly appreciated on both sides.”

I was pleased when I finally saw my contribution. After all, I’d run the gauntlet! But when it was over, I began to appreciate the exacting personalities at Project Gutenberg. It’s people like Al the Whitewasher and Michael Hart and, in the final analysis, any persnickety pro with high standards and a book to contribute that make Project Gutenberg’s collection one of the highest quality on the web.

On Reflection

This was definitely a learning experience in UGC, but when contributing to Project Gutenberg, the user is not always in control of the content. The user selects the content, but, by definition, the content is not authored by the user. We’re simply polishing up old shoes and making them available for millions of people worldwide. At PG, the content is rigorously scrutinized. This results in editing, correcting, reformatting, and, eventually, posting for access by the online community. In addition to the satisfaction of seeing your contribution in the holdings of PG, you get to read (and reread) a book that you may never have experienced so thoroughly. The procedure that PG requires necessitates that the producer become quite intimate with their contribution, leading to a greater appreciation of whatever you submit. Life teaches us that anything worthwhile requires time, energy, and work. Preparing materials for PG is definitely worthwhile.

Even if you don’t have the time to become a book producer, you could consider becoming a distributed proofreader (where you can help by just reading one page per day — see If you don’t have the time or inclination for that, you might consider encouraging a friend (or a friend of the library) to become involved with PG in some capacity.

Interview With Michael Hart

Michael Hart is one of the original internet pioneers, sharing credit with starting the Open Source Movement in 1971, along with Richard Stallman at MIT. Both started totally independently of each other but at the same time.

Michael’s (and perhaps destiny’s) choice for the first item to appear in Project Gutenberg was The Declaration of Independence; the selection is still perhaps his choice as the best. Michael began Project Gutenberg, the oldest site on the internet, at the University of Illinois.

He is also co-founder of The World eBook Fair in which he begs and borrows all the ebooks he can manage, many from John Guagliardo of The World Public Library, and equal numbers from Brewster Kahle, at the Internet Archive, both of whom have been Michael’s friends, and co-world changers, for more than a decade. In 2008, they managed well over a million free ebooks and also 160,000 commercial ebooks at a discount.

Michael sees his greatest achievement — Project Gutenberg — as a place that has changed the world without ever having had enough income to reach the average “poverty line.” Quoting from the great master, Victor Hugo, Michael says this is due to the fact that “There is no greater power on earth than an idea whose time has come.”

Here is his email response to my online interview.

Michael, it has been 5 years since we last talked, and I note your enthusiasm and dedication to adding to one of the world’s largest free online collections hasn’t waned. Has anything changed at Project Gutenberg?

About five years ago on October 15, 2003, as I recall, we had just passed the 10,000 eBook mark for the original Project Gutenberg teams and those same kinds of teams were well past 30,000 at the end of the year 2008. In addition to that site the original Project Gutenberg site we have: with over 50 languages represented, and we also have with over 75,000 .pdf eBooks in over 100 languages. So our grand total is now around 100,000. Project Gutenberg also co-sponsored The World eBook Fair [] from July 4 through August 4, 2008, which presented over a million free eBooks, plus 160,000 commercial eBooks at a discount. The grand total there was over 1,250,000 by August 4.

And of course there’s Project Gutenberg Australia at, which has links to Australia history resources and has some different content because the books there generally enter the public domain if the author died in 1954 or earlier. Project Gutenberg Europe,, has books in 59 languages, and there is also Project Gutenberg Canada,, where the collection reflects the fact that copyright generally lasts until 50 years after the end of the year of the author’s death.

After a couple of tries I managed to contribute two items to Project Gutenberg. I did quite a bit of keyboarding, but not an entire text. Do people still key in the texts?

A lot of our volunteers still prefer to type them in; it’s more fun and gives you a different “read” of one of your favorite books. I once did a 1,000 page book in a year, then I scanned and proofread the “prequel” in just three weeks, but the first one was sooo much more fun. It’s a trade-off between what is fun/educational and mass production.

After all, The Gutenberg Press was the very first example of mass production and Project Gutenberg is the first example of what I have called “Neo-Mass Production” which I HOPE will start up what I have called “The Neo-Industrial Revolution.”

I found out while preparing texts for PG, scanners aren’t always the solution.

Of course, someday scanning and optical character recognition will be 99.99% accurate, or more. . . .

After copyright has been cleared and a book has been uploaded, is there any chance that Project Gutenberg will reject it?

We haven’t rejected anything in a long time, and even the one I am thinking of was eventually accepted.

I found creating an ebook for PG a satisfying experience. To your knowledge are there any systematic projects in place where teams of people outside of PG (for example, academic department or students at library schools) are creating many ebooks and presenting them to PG?

We have Professor Mao in Taiwan, whose classes, in September 2008, moved eBooks in Chinese into the #3 slot for books in languages other than English, in PrePrints, and I have now moved on to Spanish as my next personal project, since Spanish is the third most popular Internet language, but is just barely in our “Top 10.” As a result my next major presentation is in Argentina to do some promotion of “The Both Americas Project,” and also to do the honors in opening “The Hart Library.”

What “positions” comprise the PG community? I know you engage copyright attorneys, but do opportunities exist behind the scenes at PG for the run-of-the-mill bibliophile?

We don’t have real “positions.” However, Distributed Proofreaders is more position oriented. 50,000 volunteers proofreading a page a day is quite an accomplishment!

I encountered a proofreading innovation while I was working on my contribution — something called Gutcheck.

We collect up lots of errors and write programs to find them, which takes a load of work off human shoulders.

The vast majority of texts are available as “common denominator” ASCII text files. A random search for 20 books among the 28,000 at PG came up with eight that were available in PDF and 14 available in HTML as well. Do you maintain statistics of the composition of PG by file type (i.e., what percentage is also HTML or PDF, etc.)?

I think if you redo that search in the “advanced search” menu, that you will get many more PDF and HTM and HTML files. Actually there is at least one book, in French, that is not available as plain text, simply because the volunteer asked us not to do it without the accents. I’m pretty sure it is not the only one. Don’t forget that most of the PDFs end up at the Project Gutenberg Consortia Center []. (And we also have plenty of French at PG of Canada.)

I noticed there is a small percentage of contemporary titles (not in the public domain) that are available at PG. What contemporary titles do you accept? Any memorable items?

Bruce Sterling’s Hacker Crackdown comes to mind, as I spent three hard weeks getting it to look good on the screen.

Can individuals use PG to self-publish their novels or poetry if they haven’t found a traditional publisher?

We try not to do much in the way of “Vanity Press” stuff, but we can always toss things we do not know how to do in the “PrePrints” section.

Have you any idea of who your essential benefactors are — that is, who is actually contributing texts?

In any group of thousands there is always at least one who totally stands out, some workaholic insomniac bibliophile, or the like, and we always look so much better when we see one or two of those adding huge amounts to production. Most of them prefer to remain anonymous, and even I do big amounts of my work just as an anonymous volunteer.

Any words of advice or encouragement for potential contributors?

If you ever wanted to grab hold of Archimedes famous lever and move the world, this is your chance! You could spend a week, or a month, or a year, or whatever length of time you wanted on a book a billion people might read in the future. . . . Now that is leverage!

While we’re on the subject of user-generated content, would you be willing to share your thoughts on Keen’s Cult of the Amateur?

Personally, I think Keen mostly has a cute title, but is really just another olde guarde wannabee in the olde boye networke who is so totally jealous that the avant garde now can publish book titles by the millions that billions of people can download free. If you have read his book, it should be quite obvious to you or any of his readers, that he resents anyone other than elitists, everyone other than elitists, from having any voice at all.

Keen would be happier in feudal Dark Ages times, when and where only 1% of the entire population knew how to read, but then his book could never have been a million seller. Still I’ll bet in some dark corner of his mind he would give up his million seller to be the Sheriff of Nottingham, with orders to bring in people like me and hang them in the public square.

People like Keen feel deeply threatened by The Internet and the voice it gives to people such as myself, who have never seen the order of his million dollar income in their whole lives but who can, and will, change the world as much as Johann Gutenberg did in his day, also without having any money.

I see Project Gutenberg’s use of The Internet as the primitive, first steps sort of Star Trek Communicator, Transporter, and in the most important way, as a very primitive “Replicator.” You put a book in over here, and anywhere on The Internet every person can have a copy, free of charge.

Today “The Personal Computer” Becomes “The Personal Library.” The average computer today goes for under $500. For another $500 you can add your first few terabytes, and hold enough books [in plain text .zip files] to make it into current lists of the top 100 largest libraries IN THE WORLD!!!

Now that should really scare the heck out of Mr. Keen.

After all: There are 250 languages with over a million speakers. If we only cover 40% of those, that’s 100 languages. There are 25 million books in the public domain, not to mention newspapers, magazines, etc. If we do only 40% of those books, that’s 10 million books. 100 languages. . . .10 million books. . . . If we just translate those 10 million books into those 100 languages, we have a ONE BILLION BOOK LIBRARY! After all, people such as Keen, Google, Oxford, etc., have never mentioned building a one billion book library, but I see it as a foregone conclusion … it’s going to happen … the only real question is when?

The following question, after it has been created, is who gets to have access. Obviously most of the books Google is scanning are NOT for the general public’s consumption, but just for the elitists. [This situation may change dramatically — at least for U.S. users — if the promised access to in-copyright/out-of-print books goes through, if a court approves the settlement between Google and publishers and authors. For details, read the NewsBreak, “The Google Book Search Settlement: ‘The Devil’s in the Details,’” Nov. 3, 2008, —Ed.]

Project Gutenberg was designed to lift the world from the bottom upwards, rather than in the traditional “Trickle Down Theory.” Somehow those “Trickle Down” projects rarely get to the public masses, but always lift only the top portion of the pyramid to an even higher level, thus increasing the distance between THE HAVES and THE HAVE NOTS.

My own personal goal is to see that BILLION BOOK LIBRARY being read on the average cell phone around the world when everyone, literally everyone who wants one, has a cell phone that reads, even one that reads out loud. That way EVERYONE CAN LEARN TO READ WITHOUT ANY HELP FROM ANYONE ELSE!!! And every single one of those books could be owned for a lifetime by anyone with a petabyte of drive space. By 2020 petabyte drives will be affordable. Then where does Keen’s world of Haves vs. Have Nots go??? Eh???


1. Schiff, S. “Know it all: Can Wikipedia conquer expertise?” The New Yorker, July 31, 2007. Accessed Oct. 10, 2008, from

2. Giles, J. “Internet encyclopedias go head to head,” Nature, vol. 438, Dec. 15, 2005, pp. 900–901.

3. See Andrew Keen, Cult of the Amateur, New York: Doubleday, 2007; Nicholas Carr, “Is Google making us stupid?” Atlantic Monthly, vol. 302, no. 1, July/August, 2008, pp. 56–63; Lee Siegel, Against the Machine: Being Human in the Age of the Electronic Mob, New York: Spiegel and Grau, 2008.

4. Alexa. Top sites United States. Accessed Oct. 10, 2008, from

5. Hamilton, A. “Sites we can’t live without,” TIME magazine, 2007. Accessed Oct. 10, 2008, from,28804,1812202_1812206,00.html.

Paul S. Piper's e-mail address is
       Back to top