DATABASE REVIEW
Data Rescue Project Thwarts Government Censorship Wave
by Mick O'Leary
The Data Rescue Project
SYNOPSIS
The Data Rescue Project (datarescueproject.org) is a coalition of research, academic, and library organizations that replaced data that had been removed from federal government websites in early 2025. The Data Rescue Project Portal contains more than 1,200 datasets representing subjects that the incoming administration felt should be withdrawn from public access.
|
In January 2025, users of federal government data began noticing sudden disappearances and alterations of datasets from public government websites. Amid this early confusion, a pattern quickly emerged: Much of the disappeared data covered topics that the newly inaugurated Republican administration had criticized as “woke” or otherwise objectionable. One hard-hit target was the National Oceanic and Atmospheric Administration (NOAA), which—until January at least—had provided large amounts of information on climate change. The records were removed very soon after the inauguration, which suggests that they had been targeted beforehand, and their removal did not appear to be normal data turnover or initiated by the agencies themselves.
The self-righteous zealots who thought that they were freeing the American people from leftist propaganda suffered from one mission-crushing flaw: They were deeply information illiterate. They may have thought that pulling data from the dot-gov websites would be the end of it. Instead, they were thwarted by a cadre of data librarians from dozens of research, academic, and public interest organizations who had created a vast parallel universe of government data that is beyond the reach of the government censors.
EMERGENCE
This whirlwind of independent data rescue efforts quickly coalesced into an organized campaign. In February, three of the larger groups established the Data Rescue Project, with the goal of serving as a central clearinghouse for organizing and recording rescue efforts. By summer 2025, the Data Rescue Project had more than 2 dozen members and had amassed a Google spreadsheet with 1,200-plus datasets. The records indicate file name, creating agency, original URL, size, file type, and other descriptive elements. At first, the Data Rescue Project was supported by other public data collection organizations, but it is now also accepting donations, primarily to pay its data storage costs.
THE PORTAL
In June 2025, the spreadsheet was replaced by the Data Rescue Project Portal, which provides additional information and a new three-tier organizational structure.
The top tier is an alphabetical list of datasets arranged by title. Each dataset record contains summary information, including creator and original and rescue site URLs. There are two new finding tools: a list of government offices with links to its datasets and a Data Rescue Project-developed subject classification, named Categories, with links to corresponding datasets. Each record is tagged with its categories.
The second tier might be called the “Maintainer” record. It identifies the Data Rescue Project member that curates the dataset. Maintainer records include a list of the individual files in the dataset (datasets often have several). Most records have information on the dataset’s parent project, including sponsoring agency, citation, and a short summary of its work. This tier also provides usage metrics, including the number of views and downloads. (My random sampling of dozens of records shows that many have been viewed and downloaded.)
The third tier has the file’s actual download link.
WHAT’S ACTUALLY IN THE DATA RESCUE PROJECT?
Data.gov, the largest aggregation of U.S. government data, has more than 300,000 records, most of which are from federal sources. By some estimates (see the sidebar on page 25), approximately 3,000 datasets had been removed or altered. By Aug. 31, the Data Rescue Project had 1,233 records.
The Data Rescue Project’s records, based on its classification, come from more than 80 government offices, but, based onits subject categories, a large majority fall into four categories, each with low-triple-digit totals:
- Military & Veterans Affairs—Most of these are from the Department of Veterans Affairs and include many state-level datasets. Many deal with veterans’ healthcare. To a non-expert observer, they appear to be completely mundane and hardly a threat to the administration.
- Climate & Environment—Many of these are from NOAA, including records dealing with damaging climate change effects, which the administration has repeatedly dismissed.
- Science & Research
- Health & Healthcare—These categories contain disparate mixes of reports and studies that might, somehow, offend the administration’s data police, along with purely administrative files that hardly seem likely to offend anyone.
ELSEWHERE
The replaced data in the portal is the principal content in the Data Rescue Project, but the website has other valuable sections:
- Current Efforts is a roster of nearly 40 Data Rescue Project members, with brief descriptions and links.
- Resources is a catalog of data rescue how-to information, including guidelines, tech tools, library guides, and alternative federal government data sources.
- Press and Presentations is a webography of sources and Data Rescue Project presentations. It contains more than 5 dozen articles from newspapers, magazines, and specialized reporting services from February to August 2025, as well as 11 presentations from Data Rescue Project participants. It’s a valuable record of the complex history of the federal data removals and the information community’s efforts to counter them.
WHO NEEDS THE DATA RESCUE PROJECT?
The 1,200-plus records in the Data Rescue Project are, of course, a tiny percentage of all publicly available federal data, and they comprise fewer than half of the affected federal datasets. However, the whole Data Rescue Project effort is significant far beyond these numbers. First, the removals were done arbitrarily, without agreement from the affected agencies or their constituents. Second, the data was removed not because of content errors, but rather because it differed from the administration’s political agenda. Finally and most importantly, this censorship was thwarted by a robust public data infrastructure that has been carefully curated outside of the government—and especially by the resolute efforts of hundreds of committed information specialists.
|
What the Chatbots Report
Because of lingering confusion about just what federal data has been mishandled and differing counts of “datasets,” “files,” and “pages,” there is no exact number for how much data has been affected. To get answers, I turned to—where else—AI and consulted the eight chatbots with which I have accounts. This may seem like obsessive overkill, but there are two big advantages. First, the chatbots cross-check each other, and if there is consensus, a high degree of accuracy is almost completely assured. Second, it’s a chance to compare the chatbots themselves, which I’ve done in several Database Reviews over the past few years.
The prompt was, “What percentage of federal data that have been removed by the administration in 2025 has been captured by the Data Rescue Project?” The Data Rescue Project number is exact and can easily be obtained from its site: 1,233 records as of Aug. 31. Several chatbots estimated about 3,000 removed files, based on reliable sources. The Data Rescue Project, then, has about 40% of them. Here is how dependable each chatbot was in providing each number:
Top Performers—Perplexity and ChatGPT
These each surfaced dependable—and similar—estimates of removed federal data, as well as the Data Rescue Project number and the percentage. I find that these two chatbots consistently outperform the others.
Almost-There Performers—Gemini and DeepSeek
These two had the federal and Data Rescue Project numbers, but didn’t calculate the percentage as requested in the prompt. Usually, I find DeepSeek to be a solid performer, but Gemini regularly falls a little below the level of the top performers.
Poor Performers—Claude, Meta AI, and AI Mode
These provided no federal data estimate. This is unusual for Claude, which I normally find to be a strong performer. Google’s AI Mode also often does well.
Failure—Microsoft Copilot
Copilot didn’t have any of the three numbers, which is consistent with its generally poor performance record. |