The Era of Open Government Data
by Peggy Garvin
Governments are usually seen as change-resistant, but it should come as no surprise that government information production is changing beyond the very basic shift from paper to PDF.
Established economic and social statistical series, which have long been the staples of government con tent, are facing significant technical, social, and political challenges. Agencies may need to make fundamental changes in the way they collect data, which can possibly risk the accuracy and availability of information we count on today. Statisticians and demographers met at the Association of Pub lic Data Users (APDU) 2012 conference in midSeptember in Washington, D.C., to assess the current environment and what it means for the future.
Panelists addressing the theme The Future of the Federal Statistical System in an Era of Open Govern ment Data included Robert Groves, former director of the U.S. Census Bureau; Ron Borzekowski, section chief in the Office of Research at the Consumer Financial Protection Bureau; Connie Citro, director of the National Academy of Sciences Committee on National Statistics; Mike Horrigan, associate commissioner for Prices and Living Conditions at the Bureau of Labor Statistics (BLS); and James Treat, chief of the Census Bureau’s American Community Survey Office.
While at the Census Bureau, Groves posted a three-part series in 2011 titled “The Future of Producing Social and Economic Statistical Information” on the Director’s Blog (http://directorsblog.blogs.census .gov). In it, he made five observations that led him to conclude that the “current Census Bureau survey and census methods are unsustainable”:
- The difficulties of measuring the busy, diverse, and independent American society and economy are increasing every year (that is, it costs more money to do the same things the Census Bureau has done for years).
- The demands by American business, state, local, and community leaders for statistics on their populations are continually increasing.
- New technologies are being invented almost daily that can be used to make it more convenient for the American public to participate in surveys.
- New digital data resources are being created from Federal-state-local government programs, private sector transactions , and internet-related activities.
- Near-term Federal government budgets are likely to be flat or declining.
Tom Mesenbourg, the current acting director of the Census Bureau, has continued to blog about the need for innovation at the Census Bureau, making the point that “we must find ways to integrate Census Bureau data sets with public and private data sets to develop new low cost products.” The era of open data and Big Data means that federal statistical agencies should—and have begun to—explore nonsurvey data from sources outside of their agencies.
Ensuring Data Accuracy
At the conference, the quality of new data sources was a topic of con cern. Statisticians know how to analyze the quality of sampled survey data, regularly evaluate their tech niques, and make adjustments. They are cautious about using sources that are too new to have established much of a record for accuracy. This is no small matter when the numbers determine basic social and economic indicators that governments use to allocate resources and form business and economic policy.
Aside from exploring new sources of data, conference organizers sought to improve the dialogue between survey statisticians and the open government data community. The two groups have different roles and different professional identities. Members of the federal statistical community are generally experi enced social scientists, and their data users include analysts in government, business, policy research, and academia, as well as commercial data suppliers. The dialogue between statistical agencies and these clients is well-established. The statistical programs they manage are typically re quired by law to support government programs and policymaking.
Members of the much newer open government data community tend to come from the technology community, which views the raw data that government collects for administrative or operational purposes—such as military personnel data or aviation accidents—as resources for the public to leverage for civic good or for creating new information products in the private sector. The central government data website at the federal level is Data.gov, which serves as a finding tool for data products elsewhere on the federal web. Data.gov has been building communities of interest around the data, such as portals for business, law, and health. A conference panel on open data included Jeanne Holm, evangel ist at Data.gov; Tom Lee, director of Sunlight Labs; Alex Howard, blogger for O’Reilly Media, Inc.; and Bryan Sivak, CTO at the Department of Health & Human Services.
In addition to the administrative and operational data available as open government data, APDU attendees heard about Big Data. While Big Data is not necessarily government data, governments can be a source. While at the Census Bureau, Groves described Big Data in a June 27, 2012, blog post as “mas sive data sets that are being pro duced daily through internet search, social media, and administrative data processing.” Roberto Rigobon of the Billion Prices Project at MIT and Horrigan of BLS explained the possibilities of using Big Data in the process of calculating economic indexes—particularly price indexes— in their talk on New Data in an Open Data World. Price data scraped from retailer websites can be used as part of the calculation process to enhance the timeliness of the Consumer Price Index, for example.
The Value of the American Community Survey
The APDU conference wrapped up with a panel on the endangered American Community Survey (ACS), a survey developed to replace the de cennial census long form but targeted for defunding by the House of Representatives. Many business groups support it, and a representative from the International Council of Shopping Centers spoke about its importance to its tens of thousands of member businesses. Outside of APDU, others have praised its value.
Business researcher Marcy Phelps of Phelps Research explained, “What makes the ACS so valuable—and a good investment for the U.S. government—is the detailed, up-to-date, local-level data. That’s what most businesses need.” Phelps (author of Research on Main Street: Using the Web to Find Local Business and Market Information, Information Today, Inc., 2010) predicted that, “If you leave it to the open market and go with a ‘surely someone will pick this up’ attitude, I don’t think it will happen.” ACS survived the 112th Congress but may be a target again next year in the 113th Congress.
Looking forward, Jim Treat of ACS shared the news that the Census Bureau will be trying an online response option for the survey in 2013 and will be adding new questions about internet and computer use, as requested by the Federal Communications Commission.
While the tech community has been buzzing about public data, open data, and Big Data for the past several years, APDU has been working on issues for more than 4 decades, keeping a relatively low profile in what’s happening outside of its own community of statisticians and de mographers. But that may be changing, judging from the connections at work during the 2012 APDU confer ence. Librarians, especially data librarians, joined the statisticians in attendance, a collaborative outreach that has the potential to create a healthy future for the federal statistical system.