The Latest on Factiva, Ingenta, Google,
By Paula Hane
May was busy, with several events offering excellent learning and networking
opportunities: WebSearch University, Enterprise Search Summit, Streaming Media
East, a NISO workshop on metadata, and a number of state library association
conferences. But it was a bit quieter than usual on the news front. Library
and information vendors seemed to be holding back their big announcements for
two events in June: the SLA and ALA annual conferences. These two gatherings,
with their huge exhibit halls and thousands of attendees, offer excellent opportunities
for vendors to roll out new products, showcase technologies and applications,
and meet with and entertain customers.
This year, the editors at Information Today, Inc. will provide live blog
coverage of the SLA conference, as we did for Online Information last December.
We will include what's new on the exhibit floor, what's hot from conference
sessions, photos, and general impressions of the overall SLA experience. The "Live
from Nashville" blog (http://www.infotodayblog.com), sponsored by ProQuest,
will have postings from the conference June 3—10, 2004.
In March, Factiva launched its iWorker Search Technology, a new algorithm-based
product platform. Barbara Quint provided a look at the preview, with its new
interfaces, in her March 1 NewsBreak (http://www.infotoday.com/newsbreaks/nb040301-2.shtml).
The patent-pending system seamlessly matches simple keyword searches to the
filtering capability embedded within Factiva's proprietary taxonomy. In addition,
the search experience is personalized; the user sets preferences for a specific
region and industry, which influences the results relevance.
The company recently introduced Factiva iWorks, a new product designed for
information workers (outside the corporate library) within corporations and
enterprises. Factiva iWorks lets organizations integrate the functionality
of Factiva's iWorker Search Technology within any computing environment. With
the new product, Factiva says it's directly addressing the needs of the information
worker, who has been trained to search by typing a few keywords into a little
white box on a free Web engine.
Factiva iWorks provides information workers with a current-awareness tool.
It does not access the full archive of Factiva content but features just a
90-day archive of Factiva's collection of more than 6,000 continuously updated
sources. The product offers integration with work flow, with multiple possible
access points: in a browser toolbar, Microsoft Office 2003, or a module for
a portal or intranet.
Enterprise pricing for Factiva iWorks starts at $1,600 a month for up to
50 users. Individual subscription pricing is available via registration in
Microsoft Office 2003. Access costs $9.95 for 10 articles per month, or $2.95
per article. After entering a query, unregistered users get headline results
and are prompted to register when they select a headline.
The company claims that more than 60 percent of Factiva's content is not
available for free on the Web. The statistic comes from a 2002 white paper, "Free,
Fee-Based, and Value-Added Information Services," written and edited by Mary
Ellen Bates and Donna Andersen. The methodology is included in the paper (http://www.factiva.com/collateral/
Bates recently updated the white paper, though it had not been published
at press time. The findings were the same. One key comment is well-understood
by those of us in the industry: "The free Web, therefore, is seriously lacking
in important business content, and the information that is available is difficult
to access. When knowledge workers search only the free Web for information,
it is likely that they will fail to turn up critical facts."
Out on the Web
While the traditional vendors were gearing up for June announcements, things
were anything but quiet over the last month on the Web-search scene. News from
and about Google continued to dominate. The big news was the SEC filing (finally,
after months of speculation) of Google's IPO registration as well as the information
revealed in the filing document about the company and its rivals. But Google
also made news with a major upgrade to its Blogger software and the launch
of its Google Blog (http://www.google.com/googleblog), which offers "insight
into the news, technology, and culture of Google." (Puh-leeez! As if we don't
hear enough about Google and the "Googleplex"!)
Google Reaches Out
Of greater interest and importance to researchers were Google's recently
announced partnerships with traditional information industry companies, which
continue its initiatives to include scholarly content. Ingenta, PLC, a provider
of online publishing services to academic and professional publishers, announced
the successful implementation of full-text indexing by Google. Ingenta joins
organizations like IEEE, OCLC, and others that now have content indexed by
Google had been indexing the freely available metadata on Ingenta.com, ensuring
that article titles, keywords, author names, and abstracts appeared in search
results for Google users. But as of March, Ingenta enabled full-text access
for the crawler (the "Googlebot") so that all words in articles, not just abstracts
and keywords, are indexed and searchable on Google. According to the announcement
from Ingenta, after enhancing the indexing, the Ingenta.com site's usage jumped
dramatically, "with Google referral traffic contributing to a record 5.4 million
user sessions on Ingenta.com in April."
Not all Ingenta publishers have even been included in these initial results.
Ingenta had switched on full-text crawling as a trial for a handful of publishers,
including CABI Publishing, Professional Engineering Publishing, FD Communications,
Inc., and American Ceramic Society, and said it will now be adding more publishers.
Google users who click on a search result are presented with an abstract
page on Ingenta.com, where they are either authenticated for full-text subscriber
access by virtue of IP address or user name/password, or they're offered pay-per-view
Ingenta senior product manager Kirsty Meddings said: "Becoming aware of Google's
initiative to index more scholarly content, Ingenta saw the opportunity to
increase the visibility of our publishers' material. Ingenta coordinated directly
with Google to put these benefits into effect, avoiding the need for any of
the publishers to become involved with the technical details. This relationship
is a natural extension to Ingenta's role as intermediary between publishers
and third parties."
Jumping on the Google Train
Extenza, another U.K. company, announced that Google is indexing the e-journal
content (in either Adobe PDF or full-text HTML) held on its Extenza e-Publishing
Services journal hosting platform. In making the announcement, the company
stressed the twofold benefit of the arrangement: It helps users find that important
piece of data they're seeking, and it helps publishers by driving utilization
and traffic to their content, with potential revenue benefits. Extenza's customers
range from society and not-for-profit publishers to commercial publishers.
If you're not familiar with it, Extenza e-Publishing Services is part of
Extenza, a division of Royal Swets & Zeitlinger. Extenza not only provides
conversion and hosting services for publishers but also helps librarians manage
e-journal subscriptions, enables access, and delivers usage statistics. The
company recently announced an alliance with ProQuest to offer a broad portfolio
of e-journal and database services for publishers and libraries. The companies
said the agreement delivers "a complete distribution and hosting solution for
publishers, simplifies access for end users, and streamlines e-journal management
CrossRef, a 300-member publisher trade association that provides a cross-publisher
reference-linking service, announced a pilot project called CrossRef Search
that enables users to search the full text of scholarly journal articles, conference
proceedings, and other sources from nine leading publishers. (See Barbara Quint's
NewsBreak at http://www.infotoday.com/newsbreaks/nb040503-1.shtml.) Not surprisingly,
Google is supplying the search technologies, while CrossRef is providing the
reference links to publisher Web sites. While Google incorporates CrossRef
content connections into its general Web search engine, users who go to publisher
Web sites and click on the CrossRef Search icon reach just the scholarly subset.
Separately, CrossRef announced that it now has 307 publisher members. According
to the organization, CrossRef's rate of growth has nearly doubled in recent
months, due to the new fee structures that took effect in January. More than
50 publishers have joined CrossRef since the start of this year. CrossRef has
also signed on several new libraries and affiliates in 2004, including Nerac,
a Connecticut-based research and information discovery service. In addition,
Forward Linking is now live on the CrossRef system and available for testing.
The service is on schedule for official launch this month.
Finding or Losing?
All of these publisher and vendor deals with Google raise the sticky issue
of searching subsets versus the entire mass of indexed Web content. Will users
of Google's general Web search engine really benefit? Will the scholarly articles
rise high enough in search results to actually be found, or will they be buried
in obscurity many thousands of results down? Placement is certainly an issue.
Wouldn't it be more productive to search within slices of content?
Barbara Quint pointed out the visibility problems in Google Print, Google's
own beta book search service
She suggested a sub-domain for these book records: "One called 'Library' comes
OCLC, which has been testing the opening of WorldCat records to Google access
since June 2003, has a similar problem with visibility. And the bibliographic
records in WorldCat are pretty slim by Google's indexing standards. (See the
NewsBreak at http://www.infotoday.com/newsbreaks/nb031027-2.shtml.)
According to a status report on the OCLC site: "Current page rankings for
records are not indicative of final page rankings that will be in place when
all records have been properly indexed. OCLC and Google continue to work on
improving the ranking of WorldCat records."
To locate WorldCat records on Google, use the following:
"ISBN" and ISBN number
(e.g., isbn 9630525119)
Search term plus "find in a library"
(e.g., cats "find in a library")
Search term plus "worldcatlibraries"
(e.g., cats "worldcatlibraries")
OCLC has said that it will decide this month whether to expand, continue,
or discontinue the pilot project. Stay tuned for a report on this as well as
commentary on the issue of scholarly content in the Google catalog.
Despite constant media attention, Google doesn't always grab the top spot.
The winners of the 2004 EPpy Awards were recently announced by Editor & Publisher and Mediaweek magazines
at the Interactive Media Conference & Trade Show. Winning in the category
of "Best Internet News Service [with] Over 1 Million Monthly Visitors" was
washingtonpost.com. The site took the award over both Google News and FT.com.
According to a posting by veteran journalist Jonathan Dube on CyberJournalist.net,
not only did the audience applaud loudly, but the "real buzz" came after MarketWatch.com
president and CEO Larry Kramer addressed the crowd and said he was disappointed
to see Google News as a finalist in the category and that Google News "is just
not journalism." Kramer reportedly emphasized that journalists have "a responsibility
to provide the right filters."
Interestingly, washingtonpost.com also won in the category of "Best Internet
Entertainment Service [with] Over 1 Million Monthly Visitors" and was a finalist
in several other categories. Kudos to this excellent resource.
The heated debate continues in the open-access (OA) space. A good way to
stay informed is with Peter Suber's Open Access News (http://www.earlham.edu/~peters/fos/fosblog.html).
For a flavor of some of the ongoing discussions, see the American Scientist
Open Access Forum (http://amsci-forum.amsci.org/archives/september98-forum.html).
Be forewarned if you sign up for e-mail: These are very active resources.
The U.K. Parliament's Science and Technology Select Committee continued its
inquiry into the pricing and availability of scientific publications. Following
on his coverage in the April issue of Information Today, Richard Poynder
reported in a NewsBreak on the third evidence session held on April 21 (http://www.infotoday.com/newsbreaks/nb040503-3.shtml).
There was a very definite divergence of opinion. Librarians clearly stated
that there was a crisis, while U.K. academics said there was not and expressed
skepticism about OA publishing.
According to Poynder, the librarians expressed concerns about "excessive
pricing, inflexibility over the 'bundling' of electronic journals, inequitable
copyright agreements, and restrictions on long-term access to digital material." That's
no surprise to those of us who are following the backlash among U.S. librarians
The Select Committee's final oral session on May 5 took evidence from U.K.
research councils. (The uncorrected transcript is available at http://www.publications.parliament.uk/
The committee will issue its report this month, after which the U.K. government
has 2 months to respond. Watch for our ongoing coverage.
Meanwhile, Thomson ISI announced that journals published in the new open-access
model are beginning to affect the world of scholarly research. Of the 8,700
selected journals currently covered in Web of Science, 191 are OA journals.
A study by Thomson ISI on whether OA journals perform differently from other
journals in their respective fields found that there was "no discernible difference
in terms of citation impact or frequency with which the journal is cited" (http://www.isinet.com/oaj).
Thomson's First-Quarter Results
Speaking of Thomson ISI, its parent, Thomson Corp., announced its first-quarter
2004 financial results. CEO Richard Harrington said the company was "off to
a very solid start" for the year, reporting that revenues were up by 9 percent,
though profits were down. He noted that Thomson was seeing signs of improvement
in areas that had previously been weak, especially in demand for financial
services. The company expects full-year 2004 revenue growth to be in the "mid-single-digit
range." Let's hope this outlook holds for other companies in our industry.
In a Webcast with press and analysts, Thomson outlined the following priorities
Invest in high-potential market segments
Acquire companies selectivelyspecifically, those with
strong content to leverage in existing operations
Pursue international growth, especially in Europe and Asia/Pacific
Refine the front-end customer strategy, both to identify new
customers and to target products and services for sub-segments of Thomson
Build tailored, integrated information solutions
Leverage assets across the organization to provide better products
and operating efficiencies
For the latest industry news, check http://www.infotoday.com every Monday
morning. An easier option is to sign up for our free weekly e-mail newsletter,
NewsLink, which provides abstracts and links to the stories we post.
Paula J. Hane is Information Today, Inc.'s news bureau chief
and editor of NewsBreaks. Her e-mail address is firstname.lastname@example.org.