FEATURE

Pursuing Preprints: Some Tools for Open Discovery Across Multiple Preprint Servers
by David Haden
I was recently intrigued by a proposal for a preprint server dedicated to AI-generated work, for both research proposals and finished papers. As of this writing, aiXiv is not online yet, but there is a GitHub repository with code and an early description of its goals. This novel proposal spurred me to look again at preprints—something I had not done in years.
‘SEARCHER BEWARE’
In my research, I found more than 65 dedicated preprint servers, a significant uptake of preprints since 2010, a surge in usage and acceptance since 2020, and an increase in opportunities to build new services around preprints. The preprint server landscape has evolved into a highly varied patchwork of different approaches to matters such as basic oversight and gatekeeping, governance, removal and longevity of papers, indexing, and free access. I was pleased to learn that work-in-progress papers are still largely excluded from servers. However, it was worrying to read that 75% of the 36 most popular servers will accept “opinion papers,” according to Open Information Science , and that the ViXra preprint server is “known for unorthodox and fringe science” according to Wikipedia. It thus seemed to me that “searcher beware” remains the best motto for examining preprint search results.
The lack of peer review is not the only reason to be wary. “Searcher beware” also remains apt because the main search tool for many is Google Scholar, which mixes preprints with peer-reviewed articles, has no flag to differentiate preprints, and still has no way of filtering for “only preprints” in search results. But where does the wary searcher get preprint search results today? If not from Google Scholar or from commercial walled services—such as Dimensions, Scopus, or the Web of Science Preprint Citation Index—then where can they look? I thought the answer might useful, so what follows is a brief survey of open discovery options for preprints.
SEMANTIC SCHOLAR AND PUBMED
The large, speedy Semantic Scholar has preprint server search results filters concealed under its Journals and Conferences drop-down filter, but my searches suggest preprint coverage is limited to arXiv, bioRxiv, and medRxiv. PubMed and PubMed Central are similarly limited, with preprint ingestion starting at post-2023 U.S. National Institutes of Health-funded research found on arXiv, bioRxiv, and medRxiv. The PubMed search function can include or exclude preprints, which is especially important in the medical treatment field due to lack of peer review.
SCIELO, SSRN, AND OSF
Europe PMC, an offshoot of PubMed Central, similarly indexes European-funder life science research preprints found at 32 servers. Spanish and Portuguese speakers have the open SciELO Preprints search tool, operated from Brazil by the robust and comprehensive SciELO service. Unlike many other servers, tests show that Literature and Arts is an active preprints category, and switching the site to English reliably gives English abstracts for most results. I also found that SSRN, said to index more than 30 preprint sources of various types, gives relatively good results for Literature and Arts—although these were far older than the results on SciELO.
The largest open aggregator is OSF, which indexes more than a dozen of the world’s 65-plus preprint servers along with Thesis Commons and OSF’s own OSF Preprints, OSF Projects, and OSF Registries. OSF’s ability to filter by Creative Commons licence is especially useful. A simple test search for “protein biology” had 701 results.
FOR BIOLOGY RESEARCH
Biology and bioscience have high activity in preprints, and since 2016, these fields have had the best example of an independent, open metasearch engine in the form of the University of Pittsburgh’s search.bioPreprint. This offers a friendly search box, and ranking of results is attempted. A simple test search for “protein biology” provided 998 results and a wealth of topic filter options. Full details of how the engine was made are freely available.
It is worth mentioning the old Rxivist.org, which sought to combine preprints with its X (formerly Twitter) commentaries. This is defunct, but a 4-and-a-half-year “database snapshot” of Rxivist remains available. It’s historically important because it covers the years of the COVID-19 pandemic.
MORE OPTIONS
There are field-specific preprint alert channels available on social media, including on X and Reddit, such as the large, now-defunct BiologyPreprints Reddit. Reddit users in other fields may also usefully guide searchers toward little-known email alerts, filterable RSS feeds, and even curated newsletters that track preprints.
Other options are Google News and Bing News, since journalists will often pounce on hot new preprints and publicize them. Both services can be tracked using keywords; typing “bing.com/news/search?q="microbiology"+&week+&format=preprint” into your search bar is one example. You could also use “bing.com/news/search?q="microbiology"+&week+&format=rss” to add the news as an RSS feed. Replace “bing.com/news” with “news.google.com” if you want to use Google.
A NEED FOR COMPREHENSIVE TOOLS
One especially interesting current discovery venture is preLights, an innovative and rather appealing attempt to build a community website for biologists around editorially curated preprints. preLights also adds magazine-like interviews with early-career researchers whose preprints have been chosen for spotlighting. Elsewhere, you can also find long-running podcasts such as Preprints in Motion .
Such niche ventures are welcome, but there is still a need for open discovery across all preprint servers. As of this writing, this is only partly addressed by Google Scholar. As preprint use continues to grow, there will surely be room for new comprehensive search tools. |