Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Library Resource Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

Magazines > Information Today > September 2004
Back Index Forward

Information Today

Vol. 21 No. 8 — September 2004

Feed(ing) Frenzy
By Bill Spence

RSS feeds are the wave of the future. I've been hearing this from fellow information professionals and librarians, speakers at Information Today, Inc. (ITI) conferences, and many others for the last year or so. And I'm convinced that RSS is a good thing. I get it™ ... I really do. (RSS aficionados will get the trademark reference here. For everyone else, there's Google.)

RSS (meaning either Rich Site Summary or Really Simple Syndication) is one of the good things to have emerged from the blog phenomenon of the last couple years. However, I'm still not convinced that blogs, on the whole, are the sliced bread of the new millennium. Frankly, I find most of them self-indulgent, unnecessary, and reminiscent of the Web circa 1996, when whole hordes of people discovered they could have a presence on this newfangled World Wide Web thing. (See for some examples of those early adopters.)

Blogs don't really do anything that anyone with a little Web site authoring experience (and perhaps a little programming help) couldn't already do; they just make it easier for more of the hoard to join the game. As someone who has spent years learning how to develop and maintain quality Web sites, it just peeves me a little bit that this new blogging thing is allowing more and more people into the "club," regardless of the value of what they have to say!

Now don't get me wrong. Not all blogs are as superfluous as that first wave of Web sites that foreshadowed them. Blogs like Gary Price's ResourceShelf, Jenny Levine's TheShiftedLibrarian, and Steven Cohen's Library Stuff provide valuable and timely information on developments in the library and information fields that readers of Information Today and other ITI publications will (and do) certainly find worthwhile.

The RSS feeds provided on these blog sites further enable their visitors to keep up-to-date on what is being posted to these sites. RSS feeds require the use of a newsreader (or news aggregator), which comes in many flavors. You can use a Web-based newsreader like the Rocket RSS Reader (, or you can download and install a desktop newsreader like Newz Crawler. You can even use a product like NewsGator, which is a plug-in for Microsoft Outlook that delivers RSS feeds directly into your Outlook folders.

The RSS feed itself is an XML file offered on a blog/Web site that queries the database that drives the blog/Web site. Users plug a reference to that file into their newsreaders, and whenever something new is posted, the newsreader picks it up and delivers it to the user. Simple, right?

Well, RSS is not that simple. If it were, we probably would have offered it on all our dozen-plus Web sites long ago. There are many issues to consider, though. Who's going to do the programming to create the underlying XML? Is the RSS feed going to work with our various and sundry content management systems? Is providing RSS feeds going to cut into our print sales? If we provide RSS feeds, are they going to be so wildly popular that the influx of users will bring our Web servers to a crawl? And how do we deal with Web sites that don't have a database back end?

This summer, while we struggled with the whys, whens, and whethers of providing RSS feeds, it seems our hand was forced prematurely when one of our Web sites ( was scraped. And believe me, getting scraped was a lot more painful than the term implies. In this context, scraping is defined as taking content from a Web site and repackaging it—in this case, creating an RSS feed from that scraped content. Google's news portal was scraped earlier this year, and they were so "not amused" that they issued a cease-and-desist order against the scraper (

Our experience at being scraped came at the hands of an overzealous technology provider (a company I will decline to name here) that provides aggregation and transformation tools to build competitive intelligence systems. They used their technology to scrape and repackage content from and create a number of RSS feeds for the site. They did this on behalf of a third party who "find(s) our content extremely valuable." They then took it upon themselves to let the world know that these RSS feeds were available by posting this information on their (grrr!) blog (under the title "Poof! You have RSS!").

At no time during this process was I contacted, either by the technology company or the third party for whom they did this scraping. It was only several days after these RSS feeds were a fait accompli and one of the library bloggers picked up on this information (and thankfully opined on his blog that we could certainly provide our own RSS feeds), that the company's salesperson decided it might be a good idea to see if we might be interested in their product.

Maybe it's just me, but I found this to be, what ... cheeky? Presumptuous? Galling? Unscrupulous? Of questionable legality? I'm no copyright expert, but, really, shouldn't the decision to provide RSS feeds for our content be ours to make? Google didn't appreciate being scraped; neither do we.

Now I'm sure that this company's product is a fine one, and for those folks who want to provide RSS feeds but don't have the wherewithal to do so, it is probably a good solution. But I made it clear to this salesperson, in no uncertain terms, that I had no interest in doing business with someone who would attempt to embarrass us into doing so.

To be fair, the salesperson did apologize, but countered with the assertion that all our Web sites are being scraped all the time—by Google and other search engines. Technically, this may be true, since search engine spiders essentially take our content and repackage it, but there's a difference. We want them to do this. However, if we didn't want them to do this, there are ways to request that they don't. The robots.txt file on our Web server (a totally different technology tutorial that I don't have the space for here) could request that they don't spider our site. They don't necessarily have to comply with that request, but at least we have the option. This recent brush with scraping offered us no other option than to provide our own RSS feeds—ready or not.

Having our content accessible via search engines drives more traffic to our Web sites, which is a good thing. Providing RSS feeds drives even more traffic to our Web sites. Also a good thing. But the decision to provide RSS feeds should be ours to make. So please, let us be the ones to make them. We get it ... we really do.


Bill Spence is CTO at Information Today, Inc. His e-mail address is
       Back to top