On The Net
Scholarly Web Searching: Google Scholar
By Greg R. Notess | Reference Librarian, Montana State University
Google introduced a brand-new concept with Google Scholar [http://scholar.google.com]—specialized
search aimed at finding scholarly information on the Web. With an initial focus
on research articles from publishers participating in the CrossRef project
and several collections of online preprints and other major scholarly sites,
Google established a new approach to a broad range of scholarly literature
(although its original coverage was stronger in science and technology than
in the social sciences). In true Google fashion, the new search tool not only
displayed links to individual documents, it also included citation references
extracted from other documents using special algorithms developed at Google.
Some librarians decried this poaching of our information space, while Google
advocates foresaw Scholar as the first and only source for research information.
We have seen this type of rhetoric before. Remember when Google launched Google
Answers back in 2002? The ensuing hue and cry bemoaned how this would compete
with library reference services. Google Answers continues as a fee service,
but it is certainly not a major Google money-maker, nor has it caused the death
of library and information services anywhere.
Is Google Scholar destined for a similar fate? Time will tell whether it
becomes a major access tool and replaces some of the traditional indexing and
abstracting services or ends up as yet another orphaned initiative. In the
meantime, it offers certain benefits and uses, as do several other free Web-based
scholarly search tools such as Scirus. Unfortunately, none are even close to
comprehensive. Each tool covers one segment exclusively or in very different
THE FREE SCHOLARLY TOOLS
Google Scholar is just one of the more recent additions to a long line of
academic, scientific, and other scholarly Internet search tools. In the early
days of the Web, bibliographic databases such as UnCover and Agricola were
available along with many library catalogs. Now many more bibliographic databases
exist along with working papers, preprint and e-print collections, free journals,
and many other specialized scholarly resources.
Hundreds of free, academic-oriented tools are available; hundreds of commercial
ones are available as well. Academic libraries subscribe to a multitude of
commercial online bibliographic and full-text resources and create links to
many of the free tools. Covering all of these tools is well beyond the scope
of this column, so I’ll just take a look at two of the broad, multi-disciplinary
free Web resources, with some comparison to commercial resources.
One of Google’s great advantages is its incredible public relations
ability and the general buzz it creates with new announcements such Scholar.
If use of Google Scholar rises, it may help lead more users to an institution’s
subscriptions. Elsevier’s Scirus, which has similar coverage to Google
Scholar and has been around longer, is a less-well-known scientific search
engine covering journal articles and Web sites.
Google Scholar aims to include “peer-reviewed papers, theses, books,
preprints, abstracts, and technical reports from . . . academic publishers,
professional societies, preprint repositories and universities, as well as
scholarly articles available across the Web” (see http://scholar.google.com/scholar/about.html).
Basically, Google Scholar includes Web pages that either look like an article
or other scholarly document.
Even after 6 months, although still in beta, Google will not release a list
of sources. However, it’s clear that Scholar includes journal articles
from various publishers, abstracts from bibliographic databases, and data from
e-print servers. Some prominent collections include ACM, Annual Reviews, arXiv,
Blackwell, IEEE, Ingenta, Institute of Physics, NASA Astrophysics Data System,
PubMed, Nature Publishing Group, RePEc (Research Papers in Economics), Springer,
and Wiley Interscience, although not all in their entirety. Many Web sites
from universities and nonprofit organizations are included but only documents
that seem like scholarly journal articles.
From all these sources, Google Scholar displays several types of records:
Article citation-only records
Book citation-only records
Each of these types has a different appearance, along with some accessibility
issues. What I call the “Web documents” are those records whose
title is a link to a Web page that either describes the document or links directly
to an online version of the document. The citation-only records for articles
and books have a [citation] or [book] notation, respectively, before the
title. These records have been extracted from the bibliographies in the Web
documents and do not link directly to additional information. The article citation-only
records would be much more useful if numbers for volume, issue, and pages were
Anyone using Google Scholar needs to understand the functions of the other
links for each record. The Web Document records can have multiple sources,
as in the “Occupational Allergy to Cyclamen” article, which lists
both a Blackwell-Synergy link and one from ncbi.nlm.nih.gov (which means a
PubMed citation). The title links to the first listed source. If there are
more than three sources, Scholar may have a link for “all X versions >>,” in
which the X gives the total number of sources. The multiple sources can point
to various Web pages—abstracts, preprints, publisher’s copy, author
copies, and more.
The “Cited by X” links to Web documents that include the given
record in their bibliographies. The “Web Search” link will run
a regular Google search using the primary author’s last name and a phrase
search of the document title. This can pull up other documents not in Google
Scholar. The “Library Search” link which appears on book citation-only
records connects to an Open WorldCat search for the book.
The “UC-elinks” link in the example is an OpenURL link for the
University of California system. Google Scholar preferences can choose up to
three resolvers from a few dozen academic institutions. These links will connect
to library-licensed full-text content and additional information if the searcher
knows to set this preference and is located at one of the few institutions
listed. However, OpenURL links do not appear on all records or even all Web
Document records from fee-based publishers. The “Genetic Structure” record
in the example should have one but does not.
Back in 2001, Elsevier launched Scirus as a Web search engine that would
search both Elsevier’s online journals in ScienceDirect along with selected,
science-oriented portions of the Web. In the earliest days, Scirus had a fairly
limited version of the published and Web-accessible scholarly literature. It
has grown since then to include Academic Press articles, MEDLINE citations,
and, most recently, 13 million patents. Other article sources include BioMed
Central, Crystallography Journals Online, Project Euclid, Scitation, and the
Society for Industrial & Applied Mathematics. Web-accessible preprints
are available from arXiv, CogPrints, and NASA. The Web site has also expanded
to include more sources. Unlike Google Scholar’s inclusion of Web documents
that look like articles, Scirus includes regular Web pages.
Some of these scholarly resources, such as BioMed Central, PubMed, and arXiv
are covered by both Scirus and Google Scholar. One major collection is only
included in Scirus—the 1,800-plus Elsevier journals.
Although not included in Google Scholar as Web documents, some may show
up as article citation-only records. In Scirus, the Elsevier journals are
one of the major collections.
A number of other authors have noted some problematic limitations with the
early Google Scholar. Péter Jacsó’s December 2004 review
at the Digital Reference Shelf [http://snipurl.com/dwco] contains an
extensive critique and provides evidence that far fewer documents are found
with Google Scholar compared to the native search interface of the publisher.
He also created a tool for ongoing comparisons [http://snipurl.com/dxjx]. In
February, Rita Vine noted in her blog [http://snipurl.com/dwda] that the
PubMed records in Google Scholar are missing the most recent year’s records
and are much less complete than a direct search at PubMed. Unfortunately, both
of these problems continue as of April 2005.
A quick comparison with a search for the terms protonation alkylation finds
a claim of 2,068 journal article hits and another 1,524 Web results at Scirus.
The same search at Google Scholar reports “about 1,820” records
of all types. Given Google’s usual difficulty in accurately counting
results, that number is probably within about 500 records or so of the actual
amount. On other searches Scholar finds more, but since each covers unique
content, neither is comprehensive. The same search in the native interface
American Chemical Society (ACS) publications database finds 21,685 articles.
The ACS journals are included in neither Scholar nor Scirus.
When looking at coverage of something such as PubMed that is included in
both, results also vary. A search on cicatrix finds 12,780 results at
PubMed, 11,058 at Scirus, and only 7,660 at Scholar. Given the lag problem
with Scholar, further limiting to only results from 2001, PubMed gets 461 to
420 at Scirus and only 294 at Google Scholar.
Both Scholar and Scirus search through the full text of an article, but this
is inconsistent. Searching phrases found toward the end of an article may fail
to retrieve the article. For those online journal packages that include full-text
searching capabilities, using the native search interface will be more comprehensive.
On the other hand, some online journal suites do not have full-text searching
capabilities, in which case Scholar or Scirus may be a more comprehensive option.
For fielded searching using authors, date, subject terms, or article type,
the commercial databases and native search interfaces have many more choices.
Scholar does have author, title, date, and publication fields in the advanced
search, but the fields are far less reliable than in a structured database.
More problematic is the lack of any date sort capabilities in Scholar. At least
Scirus has date sorting. The Scirus advanced search has field choices for author,
title, date, keyword, ISSN, author affiliation, and publication along with
limits for broad subject areas, collections, file formats, dates, and information
The freshness of these databases is a significant issue. As Joann Wleklinski
noted in her May/June 2005 ONLINE article (“Studying Google Scholar:
Wall to Wall Coverage?,” pp. 22–26), the database used by Google
Scholar is static at this point—it’s not adding newer documents.
Scholar definitely needs to be updated more frequently. In fact, at this point,
the main Google Web search is a much better tool for finding recent scholarly
documents than Google Scholar.
Despite all the limitations and problems, both offer some unique reasons
to use them beyond just watching their future development. For a quick, broad,
multidisciplinary search on a very narrow, specific topic, either Scholar or
Scirus can give a good start. For citation verification, both can help find
erroneous as well as correct citation information. The Cited By links at Google
Scholar can be a useful adjunct to the more comprehensive citation tracking
from citation indexes via ISI’s Web of Science (or can function as a
partial replacement for those without access).
At this point, my main use of both is for finding free Web versions of otherwise
inaccessible published articles. I found a number of full-text articles via
Google Scholar that are PDFs downloaded from a publisher site and then posted
on another site, free to all. Both Scirus and Scholar were also useful for
finding author-hosted article copies, preprints, e-prints, and other permutations
of the same article.
For the unaffiliated scholar, or for those in a small organization (government,
association, or small research lab), these tools provide both opportunity and
frustration. The opportunity? These scholars can use both tools to search for
resources. The frustration comes when a specific document is found, but it
is available online instantaneously only for those willing (and able) to pay.
Strangely enough, both of these tools may work better, or at least appear
to work better, for the affiliated scholar. With all the subscriptions available
on campus based on IP access authentication, the campus-based researcher finds
that the links in Google Scholar and Scirus work seamlessly, providing direct
access to the full-text articles. Both would work better if the Open-URL resolver
could be added automatically, based on IP address, since many institutions
have multiple access points, or like us, have our Elsevier subscriptions on
a non-ScienceDirect platform.
At my library, scholarly information searching remains with our library’s
commercial databases or via the general Web search engines rather than using
either Scholar or Scirus. The one student who mentioned using Google Scholar
prior to coming to the reference desk expressed frustration with it since, “Everything
I found cost money.” Had our OpenURL resolver been enabled, that might
have helped, but her question was answered far better in one of our commercial
databases. Both Scholar and Scirus have potential for information professionals
and end users. At this point, each covers a certain segment of scholarly material,
but plenty of problems remain. Other search tools continue to serve the scholarly
Greg R. Notess [email@example.com; http://www.notess.com/]
is a reference librarian at Montana State University
and founder of SearchEngineShowdown.com.
Comments? E-mail letters to the editor to firstname.lastname@example.org.