Pushing the Envelope of Electronic Scholarly Publishing
and Joan G. Packer, Head of Reference
Elihu Burritt Library, Central Connecticut State University
Although traditional database searching prevails as the initial strategy for locating scientific information, searchers for scholarly work these days must expand their searching beyond traditional services. Resources known as “Preprint Servers,” vital for finding information in the sciences and, more recently, the humanities and social sciences, not only facilitate location of information to answer today’s questions, but also indicate a critical emerging trend toward rapid research communication in the electronic environment.
peers has traditionally dominated the way researchers gather information.
Those peers often identify proposed publications. Electronic preprints
allow access to information without the time lag inherent in traditional
publishing. The immediacy of electronic preprint dissemination may also
foster a richer collegial interchange. But the inverse could be true also:
Researchers may become so accustomed to computer-presented information
that interaction may seem superfluous.
The term “preprint” most often refers to a manuscript that has gone through a peer-review process and now awaits publication in a traditional journal. A preprint accessible over the Web may also be referred to as an “e-print.” Preprints also cover papers that authors have submitted for journal publication, but for which no publication decision has been reached, or even papers electronically posted for peer consideration and comment before submission for publication. In fact, preprints can also be documents that have not been submitted to any journal.
The U.S. Department
of Energy, Office of Scientific and Technical Information defines a preprint
as “a document in pre-publication status, particularly an article submitted
to a journal for publication” (1).
The American Physical Society expands this, indicating that the concept
of e-prints includes any electronic work circulated by the author outside
of the traditional publishing environment (2).
In some subjects, where rapid transmission of knowledge is critical, electronic
dissemination of preprints is an absolute necessity, with subsequent traditional
publication becoming almost a formality. In mathematics and physics, for
example, formal publication provides archiving, which serves more to remind
the scholarly community of the paper’s initial appearance. Ultimately,
formal publication serves as a vehicle to support the standing of the author
The History of Preprints
Not long ago scholarly communication involved mail, fax, or more recently, anonymous FTP, gopher, and electronic mail. Although these methods of sharing information are still used, it is easier, quicker, and less expensive to post papers on the World Wide Web for reference, review, and comments. While traditional production and publication of documents require a significant investment of time, materials, and money, placing a preprint on the World Wide Web involves no printing costs and practically no distribution costs. Usually all you need is access to any program that can generate an HTML document and a Web server.
Paul Ginsparg, a physicist at the Los Alamos National Laboratory, developed the first preprint archive in August 1991 [see Figure 1 at left]. Originally dedicated to papers in high-energy theoretical physics, the “arXiv.org e-Print archive” at http://www.arxiv.org/ took several months to attract 1,000 users; presently it reports from 35,000 to 150,000 visits per day.
Preprint servers are usually hosted at professional societies, government sites, and universities. Disciplines such as astronomy, chemistry, computer science, mathematics, and physics have taken the lead in preprint distribution. Perhaps because scientists and researchers in these fields possessed the first high-level computers, preprint servers became available and then prevalent in these disciplines. Indeed, for graphically dependent sciences, preprint publication on the Web is preferable to paper journals because of the possibilities for inclusion of audio and video and other intense graphics. Electronic preprints do not represent the only example of the technological impact of the greater efficiency and storage capacity of digital media. Now some new paper-based scholarly books have begun to omit printed bibliographies, instead referring readers to Web sites.
At this point, the vast majority of preprint servers contain scientific information. Fields in the humanities and social sciences have recently begun following the trend, but still lag significantly behind in terms of servers. CogPrints [http://cogprints.soton.ac.uk] offers preprints in psychology, anthropology, philosophy, and linguistics. The Universal Preprint Directory [http://www.realsci.com/browse.cfm] offers preprints in religion and other philosophy topics.
need not simply represent what would appear in print journals. E-prints
may offer numerous value-added elements, including audio and video, as
well as linked references to other documents.
Various reports from the information science literature and other fields indicate that use of preprint servers, especially in the sciences, is very high. Albeit an unscientific marker, the number of hits received by the so-called “XXX Physics Archive” server [at the Los Alamos National Laboratory — http://www.arxiv.org AKA http://xxx.lanl.gov/] mentioned above attests to this. Preprints hosted at Los Alamos were being cited in Physical Review. D: Particles and Fields within months of the servers start-up in late 1991 (4).
librarian Gregory K. Youngen authored a report that tracked citations to
electronic preprints by traditional scholarly science journals. Youngen
observed that the number of electronically posted preprints in astrophysics
doubled every year during the 1992 through 1997 study period. Using the
Institute for Scientific Information’s SciSearch, slightly over 100 citations
to preprints were retrieved in 1995; in 1997, the number of citations to
electronic preprints rose to over 400. Youngen concluded, “The growth rate
in citations reflects not only the authors’ acceptance of the e-print,
but the publishers and editors [acceptance] of the manuscripts as well”
Servers and Searching
How does the researcher decide which server to search or browse? Professional database searchers know that if they require information from the published literature in nursing, they should search MEDLINE or Nursing and Allied Health Literature Index (CINAHL). Because of the general dishevelment of Web-based information, the biggest problem with exploiting the potential of preprint servers is simply knowing which servers to use for specific queries. Informally shared information among researchers probably accounts for end-user scholars deciding which preprint server to access for specific information. But while subject experts know which specific preprint server to explore, general information specialists may not possess this inside information.
Although some might consider tracking preprints to be the proper responsibility of indexing and abstracting services, for the foreseeable future, Web searching will remain the primary strategy for locating these reports. However, some indexing and abstracting services have finally recognized the value of tracking preprints. Chemical Abstracts recently announced it had begun to abstract and index preprints. CAS will monitor preprint servers in chemistry-related fields and include preprint abstracts in ChemAbstracts (CA), CAplus, and other databases. CAS will process preprints the same way as articles in traditionally published journals, but will mark the entries with a special field code. CAS feels a significant number of studies now appear on Web preprint servers, and that it needs to include these servers to ensure comprehensiveness.
At the editorial department of the Institute for Scientific Information, Helen Atkins stated that over 100,000 cited references from the xxx.lanl.gov server alone are currently available in Science Citation Index/SciSearch. But although searchers can identify these entries by performing “cited author” or “cited reference” searches (using the “preprint number” in the latter case), ISI does not index entire servers or preprint collections as it does with journals. Atkins commented that many of the preprints, after all, become published articles, which would result in considerable duplication; but she quickly added — and this is crucial for searchers — that some preprints never appear in journals.
The Yahoo! directory doesn’t have a category called “Preprint Servers.” We found it inefficient to use the major search engines (e.g., Google, Northern Light, etc.) to locate individual papers from preprint servers. For example, searching Northern Light with the terms “Black Hole and preprint” might retrieve a few documents, but the search would be far from comprehensive. Trying a broader approach, such as searching for “astrophysics and preprint,” is more effective. The retrieval in the latter case would yield preprint servers, where a search on “Black Hole” would work well and present a true picture of the preprint literature on the subject.
To test potential retrieval techniques when using a preprint distribution system, we decided to find papers on an astronomy topic — the Milky Way. One of the major preprint servers appropriate to this search is the CERN Document Server (CDS) at http://weblib.cern.ch/Home/. The CDS archives information in particle physics, nuclear physics, detectors and experimental techniques, accelerators and storage rings, mathematical physics and mathematics, astrophysics and astronomy, chemical physics and chemistry, and engineering, as well as commerce, economics, social science, biography, geography, and history.
CDS allows category searching and, although most researchers using the server prefer the Yahoo!-like “Navigation search” (61 percent, according to a CERN report), we still searched for “Milky Way” using the HEPDOC (High Energy Physics DOCuments) preprint searching tool at http://weblib.cern.ch/share/hepdoc/.
By searching for our keywords in titles of documents, we retrieved 91 references [see Figure 2 on page 54]. Many of the documents referred to papers published in traditional journals. Others cited papers submitted to journals but still, apparently, in press. Other postings apparently referred to articles neither published nor submitted to journals. Once you identify the documents of interest, it’s easy to read the citations and abstracts. Links to full-text papers often appear, although many of the papers require unzipping or decompressing and/or using a PostScript viewer or Adobe Acrobat. The type of access to the actual information varies from server to server. Some servers will only provide brief information and leave it to searchers to contact the authors.
A visit to the International Directory of On-Line Philosophy Papers at the University of Hong Kong [http://www.hku.hk/philodep/directory/] allowed us to look for papers either by checking off a topic or by entering keywords. The topics covered include epistemology, metaphysics, ethics, aesthetics, and feminism, among others. The papers on this server reside at the URL designated by the author.
preprint server with which we experimented was the One-Shot World-Wide
Preprints Search (which calls itself a “prototype service for a global
lookup search throughout several online scientific preprints repositories
in the world !!!”) at http://www.ictp.trieste.it/indexes/preprints.html.
Although searching this server was edifying, since it constitutes one of
the first gateways to Web preprints, it had not been updated since April
Preprints and the Question
of Peer Review
Although the caveat at the International Directory of On-Line Philosophy Papers states, “Only submissions from professional philosophers working at academic institutions are allowed” and that, “As a general policy the paper must reside on a university computer server,” it appeared particularly simple to submit papers to this site. It seems that anyone, in effect, with a URL from a university server could request that the International Directory link to a submission.
While many researchers argue that the traditional peer-review process delays presentation of results (and some contend that the process has too much bias built into it), individuals who generally oppose preprints warn that without the rigorous scrutiny of peer review, preprints may contain erroneous information. In a National Public Radio interview with physicist Paul Ginsparg and Science editor David Voss, Ginsparg remarked that discussion group interaction has often led to immediate reposting of reworked preprints with acknowledgements to individuals who have offered comments and criticism on the original preprints (6).
Obviously, the possibility of circulating poor research is a prominent concern in medicine. The New England Journal of Medicine has indicated since previously posted papers might contain errors that could mislead physicians or patients, it would not accept preprint submissions. Although Peter B. Boyce, senior consultant for the American Astronomical Society, wrote, “Being first now counts more than being thorough,” he also remarked that, “Peer pressure from colleagues does seem to keep the quality of submissions higher than might have been anticipated” (7).
Some sites, such
as the National Institutes of Health’s PubMedCentral, distinguish peer-reviewed
preprints from unrefereed materials. (Though not fully operational at this
time, PubMedCentral does have several electronic journals available with
free full text at http://www.pubmedcentral.nih.gov/.
Incidentally, PubMedCentral’s feature of publishing preprints that detail
failed studies and negative findings — information that wouldn’t make it
into the published literature — is a controversial policy that survived
opposition from journal publishers. For more information on NIH’s PubMedCentral
proposal, read http://www.nih.gov/about/director/pubmedcentral/pubmedcentral.htm.)
Electronic Preprints and Traditional
Publishing: Foes or Allies?
Publications such as Britain’s prestigious Lancet and the American Psychological Association’s Journal of Experimental Psychology: Human Perception and Performance see resistance to the Internet as futile. Both have joined other publications such as American Political Science Review, American Journal of Political Science, and the Journal of Neuroscience in accepting preprints for print publication after electronic posting. A list of clinical medicine journals that will and will not accept preprints appears at http://clinmed.netprints.org/misc/policies.shtml. According to that page, 29 publications will accept previously posted preprints, including the Proceedings of the National Academy of Sciences and the British Medical Journal; 21 publications, including the American Journal of Psychiatry, Pediatrics, and Journal of Cell Biology, will not.
Reasons for not considering preprints as publishable vary. At the journal Science, editors feared that prior versions of manuscripts may cause confusion; furthermore, Science wants its readers to see the article in print first. Critical Inquiry, an art and culture journal from the University of Chicago, maintains the same rationale.
Some other issues affecting the conflict with traditional publication include the reliance of scientific societies on journal subscriptions for income, the threat commercial journal publishers feel electronic pre-prints represent to this revenue, and discussion about government involvement in e-print publishing. In a word, it’s all about money.
the issue of archival survival is yet another misgiving. Publishers argue
that pre-print servers may not maintain archives and, therefore, important
research will be lost. Advocates of the servers, including mathematician
Rob Kirby, feel otherwise. Kirby told me, “The arXiv, also known as xxx.lanl.gov,
certainly intends to be a permanent archive. It now has so many papers
in physics, math, and other fields and is growing so well, I can’t imagine
that it will be allowed to disappear. In fact, I suspect that it is more
likely to persist into the future than a paper version, for I suspect that
future mathematicians are going to want to get all their literature quickly
on the screen, and are not going to be willing to even walk to the library,
let alone wait for a volume to be brought from some far-off storage place.”
Although some scholars believe that the electronic preprint server system can work without formal peer-review, there are instances where the lack of peer-review has resulted in bad, if not embarrassing, science. The example of “cold fusion” is compelling. Cold fusion was the idea of B. Stanley Pons, chair of the University of Utah Chemistry Department, and Martin Fleischmann, his British collaborator from the University of Southampton. They had almost no data to support their hypothesis, but feared being beat to publication by a physicist at Brigham Young University. In March 1989 their “discovery” was leaked to CBS, The Wall Street Journal, and other news outlets and then announced at a press conference, creating a great stir. They claimed to have generated the fusion of deuterium atoms (a heavy form of hydrogen) by compressing them inside their cold fusion cells (two metal electrodes composed of palladium and platinum connected to a moderate electric current and submerged in a bath of heavy water).
Some of Pons’ previous published results, including one in the Journal of Physical Chemistry, had been questioned by other scientists, but nevertheless the Journal of Electroanalytical Chemistry, volume 261, published Pons and Fleischmann’s results as a preliminary note. The managing editor Roger Parsons “respected” Pons as a scientist and a “long-time friend” and did not require peer review. Undoubtedly, he also thought it was hot news. Many were surprised by his publication of the article, because it contained no control experiment or raw data of any kind. Millions were spent at many universities in an attempt to replicate the results but without success. Pons and Fleischmann were totally discredited (8). The lesson here, we suppose, is that if a peer-reviewed journal can disseminate errors, a system where almost anything can be published may generate even more problems.
Kirby, of the mathematics
department at the Berkeley campus of the University of California, attempted
to engage the CEO and president of Reed-Elsevier in a dialogue intended
to help them see that more and more mathematicians prefer to post their
research on preprint servers and maintain copyright, rather than publish
in expensive journals. In a very cordial interchange, Kirby observed, “We
turn over the papers, with copyright, to the publishers, who add relatively
little value in producing the paper volume from our TeX files, and who
then turn around and sell the journal to the university libraries at what
is, in some cases at least, an exorbitant price ”(9).
Professor Kirby also noted that he believes refereeing with acceptance
or rejection of research submissions, could be done electronically. In
private e-mail correspondence, Kirby told us that his petitions effected
no change at Reed-Elsevier and added, “So, it becomes crucial that authors
retain copyright, and post their papers somewhere where access is not so
Additional Issues: Plagiarism
One of the risks associated with electronic publishing is plagiarism. Writing in favor of electronic preprints, Matthew and Gordon Wills stated, “The major change retardant is perceived to be anxiety by editors and authors that such a widespread distribution of preprints may lead to plagiarism. There will surely be occasional instances of this, but the benefit of the additional feedback from a body of other interested authors which would not normally be available more than compensate for such a risk” (11). Steven Harnad, director of the Cognitive Sciences Center at Southampton University, echoes Wills’ remarks: “Researchers fear that publicly posting their preprints may lead to plagiarism…. In general it seems to have been the pattern, so far invariably, that whatever the new problem or vulnerability the Net breeds, it also breeds even more powerful means of remedying the problem and combating the vulnerability” (12). Although several journal articles and Web sites allude to the problem of plagiarism, authors evidently see it as more of a potential problem than an actual one.
should own the copyright remains a prickly issue. A few servers steer clear
of the fray through omission. The Theoretical Ecology Preprint Database
states, “While we will not be posting the actual preprint text [because
of potential copyright issues], some authors may make their preprints available
electronically by providing a link to an online Web version.”
Other servers deal with copyright in a straightforward manner by declaring policies on the home page or in an FAQ. The Topology Atlas Preprints, a server at York University in Toronto [http://at.yorku.ca/topology/preprint.htm], indicates that most preprint servers expect authors to submit their e-print to a journal or to be in the process of submitting it. When the paper is accepted, most journals require that copyright be transferred to them. Recognizing this, the Topology Atlas then withdraws the preprint, but not the abstract, from its site and displays the publication information. Topology Atlas notes that a few publishers, for example the American Mathematical Society, permit earlier versions to remain on the servers. In cases where posting on the Topology Atlas is the only form of publication, the author retains the copyright.
Elsevier Science allows authors to post preliminary versions of their articles on personal home pages or servers, but does not allow authors to place the revised final accepted version (as it appears in an Elsevier journal) on the Web — except for the version appearing in their commercial product, Elsevier Science Direct.
Stridently opposed to transfer of copyright to publishers, CogPrints notes a “huge conflict of interest,” because the author no longer has control of the paper. CogPrints thoroughly covers copyright in one of its bristling FAQs [http://cogprints.so ton.ac.uk/help/copyright.html]. According to the FAQ, CogPrints is not a “refereed journal,” but does “do some filtering to flush out the crazies. So anyone can archive anything in principle.” CogPrints says that such archiving has little negative efects. CogPrints plans, as other servers have — including PubMedCentral — to have two categories: unrefereed preprints and “author authenticated reprints of refereed, accepted papers.”
The idea of authors retaining copyright is nothing new. Scholars, noting the rising costs of journals, have long pondered the concept. Researchers at the California Institute of Technology, Yale, and the University of Kansas were first to consider confronting publishers with this radical notion and putting their information online. Karen A. Hunter, vice-president of Elsevier Science, indicated that the threat of such competition would cause Elsevier to have serious reservations about publishing material produced by universities with policies that required professors to retain the copyrights to their articles (13).
seem to vary regarding their stance toward preprints and copyright. The
American Psychological Association, which allows individual APA journal
editors to decide whether they will accept previously posted preprints,
takes a dim view of e-prints. Its admonishment, located at http://www.apa.org/journals/posting.html,
reads (in part):
Do you want to put your article on the Internet — on your own home page or that of your university? There are advantages, but also a few precautions to take if you want to publish the article later or if it has already been published.Many legal issues surrounding the Internet and intellectual property are still murky and rapidly changing. The APA P&C Board, therefore, has adopted an interim policy to be reviewed periodically, which is summarized below:
APA journal editors and the Publications and Communications (P&C) Board first took a long look at the implications of the Internet for publishing at their spring 1996 meetings and again in fall 1997.
The informal and distributed nature of the Internet is a boon to the scientific community, students, and the public. Anyone can share information, ideas, and events with others.
Your paper posted on the Internet belongs to you, but others may consider it in the “public domain” and copy or use it unless you say what you want to allow. A few journals, such as Neuroscience and The New England Journal of Medicine, consider papers posted on the Internet to be “published” and will not consider them for print publication. If your paper is published by a journal, posting it on the Internet may violate the copyright transfer agreement with the print publication.
In contrast, The Association for Computing Machinery is somewhat more flexible; it at least allows authors to retain a copy on their own Web pages. The ACM says: “Authors may post an author-version of their own ACM-copyrighted work on a personal server or on a server belonging to their employer, but they may not post a copy of the definitive version that they downloaded from ACM’s Digital Library” [http://www.acm.org/pubs/copyright_policy/].
The Institute of Electrical and Electronics Engineers retains copyright, but it will allow authors to post the IEEE published version. Its policy page states: “When IEEE publishes the paper, the author must replace the electronic version with the full citation to the IEEE publication or the IEEE published version, including the IEEE copyright notice and full citation. Prior or revised versions of the paper must not be represented as the published version” [http://www.ieee.org/about/documentation/copyright/policies.htm].
For more expert
discussion of the future of scholarly publishing and journal subscriptions
in light of online preprints, you can listen to an interview with physicist
Paul Ginsparg and Science editor David Voss from February 2, 1996’s National
Public Radio program “First Hour: Science Journals On-line” at http://www.npr.org/programs/totn/archives/nf6f23.html.
Preprints and the Future
Some searchers may find pinpointing the best preprint server for their search a challenge, but scholars in the scientific disciplines are working on a solution in real time. Imagine a scenario in which any researcher quickly accesses any preprint from any archive. In October, 1999, a meeting was held in Santa Fe, New Mexico, where participants included librarians, publishers, and computer scientists. The unifying goal was the establishment of a universal preprint archive. Laying the foundation for the resolution of technical challenges such as archive maintenance, accessibility, and interoperability, the project was called the “Open Archives Initiative” (14). A prototype of the service can be searched at http://ups.cs.odu.edu. Searching the prototype is exciting, inasmuch as it offers metasearches of important servers and then delivers the information in a clear manner using user-selected sorting [see figure 4 and figure 5 on page 59].
Many dilemmas remain. Will preprints end up in traditional indexing and abstracting services? How can one track revisions or different versions of papers? How do the systems document changes? Or do they? How far off are well-maintained archives that demonstrate a high level of continuity? Will humanities scholars embrace preprint servers more? But as these dilemmas are resolved, the electronic preprint may come to replace the paper-based draft of most research articles, just as most print journals are now at least available electronically (and may someday be only available in that format).
So what does all
this mean to the working searcher? How much effort should librarians devote
to incorporating preprint servers into their Web pages and literature searches?
Oddly enough, the boycott some commercial publishers have begun imposing
on researchers publishing in e-print format may decide the issue, though
not the way the publishers might have wished. If publishers decide to enforce
policies that do not permit authors to subsequently publish their preprinted
research in journals, which might effectively preclude the work of entire
faculties at various institutions, it becomes mandatory that STM information
workers become savvy preprint searchers. There is a critical mass of information
on these scientific servers that cannot be ignored. Even if the e-prints
do end up in print, the often lengthy lag times before one sees the article
in print could make preprint searching a necessity for current, state-of-the-art
Many preprint servers contain links to other preprint servers, as well as being searchable themselves. Here are some notable preprint sites:
Computer Science Technical Reference Library)
(Stanford Public Information Retrieval System — High Energy Physics)
Ecology Preprint Database
1. Jordan, Sharon M., “Preprint Servers: Status, Challenges, and Opportunities of the New Digital Publishing Paradigm,” InForum ’99. May 5, 1999. [http://www.osti.gov/inforum99/papers/jordan.html]. July 24, 2000.
2. American Physical Society, “What Are Eprints?,” January 7, 1998. [http://publish.aps.org/eprint/docs/faq.html]. July 25, 2000.
3. Lim, Edward, “Preprint Servers: A New Model for Scholarly Publishing?” Australian Academic and Research Libraries, Vol. 27, No. 1, March 1996, pp. 21-30.
4. Smith, Arthur P., “The Journal as an Overlay on Preprint Databases,” Learned Publishing, Vol. 13, No. 1, January 2000, pp. 43-48.
5. Youngen, Gregory K., “Citation Patterns to Electronic Preprints in the Astronomy and Astrophysics Literature,” Library and Information Services in Astronomy, Vol. 153, 1998 [http://www.stsci.edu/stsci/meetings/lisa3/youngeng.html]. July 3, 2000.
6. Boyce, Peter B., “For Better or Worse: Preprint Servers Are Here to Stay,” College and Research Libraries News, Vol. 61, No. 5, May 2000, pp. 404-407, 414.
7. Taubes, Gary, Bad Science, the Short Life and Weird Times of Cold Fusion, New York: Random House, 1993.
8. Guernsey, Lisa and Vincent Kiernan, “Journals Differ on Whether to Publish Articles That Have Appeared on the Web,” Chronicle of Higher Education, Vol. 44:A27, July 17, 1998.
9. “Letter to Elsevier,” Concerns of Young Mathematicians, Vol. 6, No. 3, January 28, 1998. [http://youngmath.org/archive/V6/vol6.3.html]. July 3, 2000.
10. “Exchange with Elsevier Continued,” Concerns of Young Mathematicians, Vol. 6, No. 6, February 17, 1998. [http://youngmath.org/archive/V6/vol6.6.html]. July 3, 2000.
11. Wills, Mathew and Gordon Wills, “The Ins and Outs of Electronic Publishing,” Journal of Business and Industrial Marketing, Vol. 11, No. 1, 1996, pp. 90-105.
12. Harnad, Steve, “How to Fast-Forward Serials to the Inevitable and the Optimal for Scholars and Scientists,” June 22, 1999. [http://www.kb.se/bibsam/bibnytt/harnad.htm] July 27, 2000.
13. Guernsey, Lisa, “A Provost Challenges His Faculty to Keep Copyright on Journal Articles,” Chronicle of Higher Education, September 18, 1998. [http://www.chronicle.com/free/v45/i04/04a02901.htm] July 31, 2000.
14. Van de Sompel, Herbert and Carl Lagoze, “The Santa Fe Convention of the Open Archives Initiative,” D-Lib Magazine, February 2000, Vol. 6, No. 2. [http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html]. July 3, 2000.