Disappearing and Disappeared Data

Information professionals like their data to be stable and not to disappear. In that, they’re no different from researchers in other disciplines, particularly since it’s frequently these other researchers for whom information professionals are finding the data. Whether it’s time series, datasets, research reports, scientific studies, or digital texts, we want continuity.

Websites that change because of updated information are welcomed, of course. Data stability doesn’t preclude the addition of new data. It’s massive changes in older data or its outright vanishing that makes sources suspect and scares researchers.

Instability in data can happen for benign reasons. An agricultural time series can be thrown off kilter when a commodity is added or removed. Perhaps it was recently introduced in a region or no longer grown there. Data can also disappear when it’s wrong. Incorrectly gathered data—a population subset omitted, instrumentation malfunctions, or inaccurate analysis—should be withdrawn. Worse, if fraud is involved, that data should not see the light of day. I find stunning the number of retracted scholarly articles tracked by RetractionWatch.com.

Sometimes data disappears inadvertently. A domain name is not renewed. The information previously there is gone. If the domain name is then picked up by another entity, the original nature of the source becomes entirely different. I know of a respected conference website transmogrified to a porn site. The Indiana personal finance government website mentioned in this issue’s Dollar Sign column is another example.

Loss of funding can make a website unsustainable. Particularly susceptible are academic digitization projects funded by grants. When the grant runs out, local funding may not be sufficient, or even available, to keep the digitized materials online. It’s not just grant funding. Government agencies can decide to no longer fund a project, leading to data disappearing. GLIN, the Global Legal Information Network, lost U.S. funding in 2012 and is now attempting a comeback as the GLIN Foundation (glinf.org).

In a worst-case scenario, data disappears by design. A government decides that scientific environmental research reports, documentation of animal abuse, or earth science and atmospheric datasets will be scrubbed from its websites. When scientific data collides with a politician’s belief, too often it’s the data that loses. Suppression of research findings makes citizens’ access to information impossible. Vaporizing existing data does a disservice to humanity. Cutting back on government data collection, as has happened (and more is threatened) with the U.S. Census, leads to bad public policy. If local governments don’t have data, they can’t adequately plan for public services. The business community loses its ability to obtain essential information for growth.

A letter signed by 66 public interest institutions requests that the U.S. Office of Management and Budget remind government agencies that they are legally required to “give public notice before removing online government information” (openthegovernment.org). Librarians can champion the preservation of website data, either by archiving the sites at the Internet Archive or with newer initiatives such as Data Refuge (ppehlab.org/datarefuge). Data will persist if information professionals remain vigilant. Libraries should be dynamic places, but data should be stable and not disappear.