By Bill Spence
RSS feeds are the wave of the future. I've been hearing
this from fellow information professionals and librarians,
speakers at Information Today, Inc. (ITI) conferences,
and many others for the last year or so. And I'm convinced
that RSS is a good thing. I get it™ ... I really
do. (RSS aficionados will get the trademark reference
here. For everyone else, there's Google.)
RSS (meaning either Rich Site Summary or Really Simple Syndication) is one
of the good things to have emerged from the blog phenomenon of the last couple
years. However, I'm still not convinced that blogs, on the whole, are the sliced
bread of the new millennium. Frankly, I find most of them self-indulgent, unnecessary,
and reminiscent of the Web circa 1996, when whole hordes of people discovered
they could have a presence on this newfangled World Wide Web thing. (See http://www.webpagesthatsuck.com for some examples of those early adopters.)
Blogs don't really do anything that anyone with a little Web site authoring
experience (and perhaps a little programming help) couldn't already do; they
just make it easier for more of the hoard to join the game. As someone who
has spent years learning how to develop and maintain quality Web sites, it
just peeves me a little bit that this new blogging thing is allowing more and
more people into the "club," regardless of the value of what they have to say!
Now don't get me wrong. Not all blogs are as superfluous as that first wave
of Web sites that foreshadowed them. Blogs like Gary Price's ResourceShelf,
Jenny Levine's TheShiftedLibrarian, and Steven Cohen's Library Stuff provide
valuable and timely information on developments in the library and information
fields that readers of Information Today and other ITI publications
will (and do) certainly find worthwhile.
The RSS feeds provided on these blog sites further enable their visitors
to keep up-to-date on what is being posted to these sites. RSS feeds require
the use of a newsreader (or news aggregator), which comes in many flavors.
You can use a Web-based newsreader like the Rocket RSS Reader (http://reader.rocketinfo.com),
or you can download and install a desktop newsreader like Newz Crawler. You
can even use a product like NewsGator, which is a plug-in for Microsoft Outlook
that delivers RSS feeds directly into your Outlook folders.
The RSS feed itself is an XML file offered on a blog/Web site that queries
the database that drives the blog/Web site. Users plug a reference to that
file into their newsreaders, and whenever something new is posted, the newsreader
picks it up and delivers it to the user. Simple, right?
Well, RSS is not that simple. If it were, we probably would have offered
it on all our dozen-plus Web sites long ago. There are many issues to consider,
though. Who's going to do the programming to create the underlying XML? Is
the RSS feed going to work with our various and sundry content management systems?
Is providing RSS feeds going to cut into our print sales? If we provide RSS
feeds, are they going to be so wildly popular that the influx of users will
bring our Web servers to a crawl? And how do we deal with Web sites that don't
have a database back end?
This summer, while we struggled with the whys, whens, and whethers of providing
RSS feeds, it seems our hand was forced prematurely when one of our Web sites
(EContentMag.com) was scraped. And believe me, getting scraped was a lot more
painful than the term implies. In this context, scraping is defined as taking
content from a Web site and repackaging itin this case, creating an RSS
feed from that scraped content. Google's news portal was scraped earlier this
year, and they were so "not amused" that they issued a cease-and-desist order
against the scraper (http://www.internetnews.com/ec-news/article.php/3334651).
Our experience at being scraped came at the hands of an overzealous technology
provider (a company I will decline to name here) that provides aggregation
and transformation tools to build competitive intelligence systems. They used
their technology to scrape and repackage content from EContentMag.com and create
a number of RSS feeds for the site. They did this on behalf of a third party
who "find(s) our content extremely valuable." They then took it upon themselves
to let the world know that these RSS feeds were available by posting this information
on their (grrr!) blog (under the title "Poof! You have RSS!").
At no time during this process was I contacted, either by the technology
company or the third party for whom they did this scraping. It was only several
days after these RSS feeds were a fait accompli and one of the library
bloggers picked up on this information (and thankfully opined on his blog that
we could certainly provide our own RSS feeds), that the company's salesperson
decided it might be a good idea to see if we might be interested in their product.
Maybe it's just me, but I found this to be, what ... cheeky? Presumptuous?
Galling? Unscrupulous? Of questionable legality? I'm no copyright expert, but,
really, shouldn't the decision to provide RSS feeds for our content
be ours to make? Google didn't appreciate being scraped; neither do
Now I'm sure that this company's product is a fine one, and for those folks
who want to provide RSS feeds but don't have the wherewithal to do so, it is
probably a good solution. But I made it clear to this salesperson, in no uncertain
terms, that I had no interest in doing business with someone who would attempt
to embarrass us into doing so.
To be fair, the salesperson did apologize, but countered with the assertion
that all our Web sites are being scraped all the timeby Google and other
search engines. Technically, this may be true, since search engine spiders
essentially take our content and repackage it, but there's a difference. We want them
to do this. However, if we didn't want them to do this, there are ways to request
that they don't. The robots.txt file on our Web server (a totally different
technology tutorial that I don't have the space for here) could request that
they don't spider our site. They don't necessarily have to comply with that
request, but at least we have the option. This recent brush with scraping offered
us no other option than to provide our own EContentMag.com RSS feedsready
Having our content accessible via search engines drives more traffic to our
Web sites, which is a good thing. Providing RSS feeds drives even more traffic
to our Web sites. Also a good thing. But the decision to provide RSS feeds
should be ours to make. So please, let us be the ones to make them. We get
it ... we really do.
Bill Spence is CTO at Information Today, Inc. His e-mail address is firstname.lastname@example.org.