Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology DBTA/Unisphere
PRIVACY/COOKIES POLICY
Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research




Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com)

Magazines > Information Today > November/December 2025

Back Index Forward
SUBSCRIBE NOW!
Information Today
Vol. 42 No. 6 — Nov/Dec 2025
FEATURE
Insights on Content


Pursuing Preprints: Some Tools for Open Discovery Across Multiple Preprint Servers

by David Haden

I was recently intrigued by a proposal for a preprint server dedicated to AI-generated work, for both research proposals and finished papers. As of this writing, aiXiv is not online yet, but there is a GitHub repository with code and an early description of its goals. This novel proposal spurred me to look again at preprints—something I had not done in years.

‘SEARCHER BEWARE’

In my research, I found more than 65 dedicated preprint servers, a significant uptake of preprints since 2010, a surge in usage and acceptance since 2020, and an increase in opportunities to build new services around preprints. The preprint server landscape has evolved into a highly varied patchwork of different approaches to matters such as basic oversight and gatekeeping, governance, removal and longevity of papers, indexing, and free access. I was pleased to learn that work-in-progress papers are still largely excluded from servers. However, it was worrying to read that 75% of the 36 most popular servers will accept “opinion papers,” according to Open Information Science , and that the ViXra preprint server is “known for unorthodox and fringe science” according to Wikipedia. It thus seemed to me that “searcher beware” remains the best motto for examining preprint search results.

The lack of peer review is not the only reason to be wary. “Searcher beware” also remains apt because the main search tool for many is Google Scholar, which mixes preprints with peer-reviewed articles, has no flag to differentiate preprints, and still has no way of filtering for “only preprints” in search results. But where does the wary searcher get preprint search results today? If not from Google Scholar or from commercial walled services—such as Dimensions, Scopus, or the Web of Science Preprint Citation Index—then where can they look? I thought the answer might useful, so what follows is a brief survey of open discovery options for preprints.

SEMANTIC SCHOLAR AND PUBMED

The large, speedy Semantic Scholar has preprint server search results filters concealed under its Journals and Conferences drop-down filter, but my searches suggest preprint coverage is limited to arXiv, bioRxiv, and medRxiv. PubMed and PubMed Central are similarly limited, with preprint ingestion starting at post-2023 U.S. National Institutes of Health-funded research found on arXiv, bioRxiv, and medRxiv. The PubMed search function can include or exclude preprints, which is especially important in the medical treatment field due to lack of peer review.

SCIELO, SSRN, AND OSF

Europe PMC, an offshoot of PubMed Central, similarly indexes European-funder life science research preprints found at 32 servers. Spanish and Portuguese speakers have the open SciELO Preprints search tool, operated from Brazil by the robust and comprehensive SciELO service. Unlike many other servers, tests show that Literature and Arts is an active preprints category, and switching the site to English reliably gives English abstracts for most results. I also found that SSRN, said to index more than 30 preprint sources of various types, gives relatively good results for Literature and Arts—although these were far older than the results on SciELO.

The largest open aggregator is OSF, which indexes more than a dozen of the world’s 65-plus preprint servers along with Thesis Commons and OSF’s own OSF Preprints, OSF Projects, and OSF Registries. OSF’s ability to filter by Creative Commons licence is especially useful. A simple test search for “protein biology” had 701 results.

FOR BIOLOGY RESEARCH

Biology and bioscience have high activity in preprints, and since 2016, these fields have had the best example of an independent, open metasearch engine in the form of the University of Pittsburgh’s search.bioPreprint. This offers a friendly search box, and ranking of results is attempted. A simple test search for “protein biology” provided 998 results and a wealth of topic filter options. Full details of how the engine was made are freely available.

It is worth mentioning the old Rxivist.org, which sought to combine preprints with its X (formerly Twitter) commentaries. This is defunct, but a 4-and-a-half-year “database snapshot” of Rxivist remains available. It’s historically important because it covers the years of the COVID-19 pandemic.

MORE OPTIONS

There are field-specific preprint alert channels available on social media, including on X and Reddit, such as the large, now-defunct BiologyPreprints Reddit. Reddit users in other fields may also usefully guide searchers toward little-known email alerts, filterable RSS feeds, and even curated newsletters that track preprints.

Other options are Google News and Bing News, since journalists will often pounce on hot new preprints and publicize them. Both services can be tracked using keywords; typing “bing.com/news/search?q="microbiology"+&week+&format=preprint” into your search bar is one example. You could also use “bing.com/news/search?q="microbiology"+&week+&format=rss” to add the news as an RSS feed. Replace “bing.com/news” with “news.google.com” if you want to use Google.

A NEED FOR COMPREHENSIVE TOOLS

One especially interesting current discovery venture is preLights, an innovative and rather appealing attempt to build a community website for biologists around editorially curated preprints. preLights also adds magazine-like interviews with early-career researchers whose preprints have been chosen for spotlighting. Elsewhere, you can also find long-running podcasts such as Preprints in Motion .

Such niche ventures are welcome, but there is still a need for open discovery across all preprint servers. As of this writing, this is only partly addressed by Google Scholar. As preprint use continues to grow, there will surely be room for new comprehensive search tools.

Links to the Sources

aiXiv’s GitHub page
github.com/aixiv-org/aiXiv

Wikipedia’s List of Preprint Repositories
en.wikipedia.org/wiki/List_of_preprint_repositories

Open Information Science : “Most Preprint Servers Allow the Publication of Opinion Papers”
doi.org/10.1515/opis-2022-0144

viXra
vixra.org

Wikipedia’s viXra entry
en.wikipedia.org/wiki/ViXra

Google Scholar
scholar.google.com

Semantic Scholar
semanticscholar.org

PubMed
pubmed.ncbi.nlm.nih.gov

PubMed Central
pmc.ncbi.nlm.nih.gov

Europe PMC
europepmc.org

SciELO Preprints
preprints.scielo.org/index.php/scielo

SSRN
papers.ssrn.com/sol3/DisplayAbstractSearch.cfm

OSF
osf.io/search/?q=&resourceType=Preprint

search.bioPreprint
hsls.pitt.edu/preprint

Rxivist.org
rxivist.org

Reddit’s BiologyPreprints archive
old.reddit.com/r/BiologyPreprint

preLights
prelights.biologists.com

Preprints in Motion
preprintsinmotion.wordpress.com

David Haden
DAVID HADEN
is the former editor of
Digital Art Live magazine. He now works with a large, well-known British firm. Haden is the curator of the JURN search tool for open discovery of OA arts and humanities content (jurn.link/jurnsearch). Send your comments about this article to itletters@infotoday.com.