Like so much of the nation’s infrastructure, the U.S. Federal Statistical System (FSS) enters the second decade of the 21st century under stress and strain. Current and former federal agency stewards have, during the last 30 years, highlighted many problems the statistical system is facing. They are urging political leaders to act now to restore the nation’s statistical system to its previous acclaim. It is too valuable and important, given that it touches all aspects of American life.
Government agencies issue annual reports on the economic health of the nation, inflation and prices, the unemployment rate and the stability of the labor force, the poverty rate, how we travel to work and home, the weather and national disasters, and much more. Other statistics are gathered on a less-frequent schedule (the economic census is every 5 years). Based on these statistics, stock markets will move in a positive or a negative direction (check your 401K), and interest and exchange rates will go up or down. These fluctuations affect saving for retirement, getting a mortgage, and the overall purchasing power of the American dollar. Government statistics are also used by policymakers to inform and evaluate public policy initiatives. Census numbers determine the political mix of officeholders in Congress. Researchers look to government statistics to help them determine if a policy worked or not.
Problems with structure, resources, data, data sharing, privacy, and confidentiality are not new, but the time to act is now, given the push to upgrade the nation’s infrastructure. There are four major problems with the nation’s statistical system. The decentralized structure of the system has some strengths but many more weaknesses. Declining budgets and staff resources have hit the statistical agencies hard and are eroding their mission to deliver high-quality, timely, ac curate, credible, trustworthy, and independent statistics. A third problem centers on the methods of data collection and changing data needs. Lastly, the statistical agencies are required to keep our data confidential and private without sacrificing accuracy, transparency, and reliability. This is not an easy task these days.
THE DECENTRALIZED Structure
The U.S. statistical system is, by design, a sprawling, highly decentralized system that fans out across approximately 125 federal agencies, with more than 100 statistical programs collecting, organizing, analyzing, and distributing statistical information about the nation. This differs from other countries whose statistical infrastructures are centralized. Examples include Stats New Zealand (stats.govt.nz), Statistics Sweden (scb.se/en), Statistics Netherlands (cbs.nl/en-gb), and Statistics Canada (statcan.gc.ca/eng/start).
The design enables each agency to work directly with its users and stakeholders to gain a key understanding of the information needs of its specific audience. However, the weaknesses have outstripped these benefits. The fragmented and disjointed system has led to duplication of effort between agencies, inconsistency of the data collected, and in efficient use of resources.
Furthermore, the management of the entire system has been left to the Statistical Science Policy Unit, housed within OIRA/OMB (Office of Information and Regulatory Affairs/ Office of Management and Budget), an understaffed and overworked office that cannot effectively manage such a large and unwieldy statistical system, as Julia Lane claims in Democratizing Our Data: A Manifesto (MIT Press, 2020). A 2017 publication, “Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy” (doi.org/10.17226/24652), points out various problems with U.S. statistical agencies, including overlap, decreasing response rates, and questionable survey design.
Congressional oversight mirrors the fragmented system as well. No single congressional committee or subcommittee oversees all the statistical agencies. Instead, different subcommittees are responsible for setting agency budget requirements, size, and mission. For example, powerhouse statistical agencies such as the Bureau of Economic Analysis (BEA), the Bureau of Justice Statistics (BJS), the Census Bureau, and the National Center for Science & Engineering Statistics (NCSES) are overseen by the House Commerce, Justice, Science, and Related Agencies subcommittee. The House Labor, Health and Human Services, Education, and Related Agencies subcommittee has oversight over the Bureau of Labor Statistics (BLS), the National Center for Education Statistics (NCES), and National Center for Health Statistics (NCHS).
Not surprisingly, different congressional committees have set up different mandates, which has led to a duplication of coverage. And because very few members of Congress fully understand what their agencies do and why they are important, the statistical agency budgets are used as bargaining chips by congressional committees to support non-statistical requests. Consequently, the structure of the statistical system makes it that much harder to innovate, keep costs steady, find efficiencies, and adapt to changes within society.
Budget, Staff, and Training
During the last 13 years, statistical agencies have experienced flat or declining budgets and a considerable loss of purchasing power. Since 2009, only three agencies of the 12 examined by the Statistical Association (ASA) have seen their budgets increase; the other 10 have seen declines of between 5% and 28%. The key takeaway: “many statistical agencies are struggling to continue established programs, let alone respond to new data needs or take advantage of methodological and technological advances that would improve their data and reduce costs and respondent burden” (“ASA, COPAFS, Partners Urge Bolstering of Federal Statistical Agencies”; magazine.amstat.org/blog/2021/03/01/fed-stat-agencies). Steve Pierson, director of science policy at ASA, has created and maintains a Google docs spreadsheet that shows the budget figures for the statistical agencies from 2003 on (docs.google.com/spreadsheets/d/1_xt8oI2neZyTwaZvtyQOtujzuHnjemZPwPuYVsEELr0/edit#gid=0).
The loss of purchasing power has dramatically impacted three important areas: staffing, innovation, and training. As budgets have declined, so too has the number of full-time employees (FTE), according to statistics compiled by ASA for fiscal year 2016 through FY 2020. For example, during the last 25 years, the NCES has seen its full-time permanent staff numbers decline from a high of 130 FTE in 1995 to the current number of 88 full-time permanent staff, although it is allowed to hire up to 95 employees. The agency’s budget is currently 12% below its FY 2009 budget, which has led to a staffing crisis in one of the most important statistical agencies in the nation.
The concern is reflected in a letter sent by a group of former agency commissioners asking Congress to increase the agency’s statistical budget by at least 5% to offset the more than 20% loss in purchasing power since 2009. According to a March 31, 2020, article in the Washington Post, the letter explains, “The extra money would be used in part to help NCES track emerging education trends and provide more timely and regional data, ‘efforts that are currently taxed due to both the loss of the agency’s purchasing power and its staffing crisis’” (“Understaffing Threatens Work at Key U.S. Education Statistics Agency, Experts Say”; www.washingtonpost.com/education/2020/03/31/understaffing-threatens-work-key-us-education-statistics-agency-experts-say).
Additionally, agencies need to hire data scientists and other data experts who have the skills and training to handle the new challenges of big data. Robert Groves, former director of the Census Bureau from 2009 to 2012, put it this way: “All of the federal statistics agencies, including NCSES, are in the middle of a gigantic paradigm shift where new data resources have to be added in clear ways. These are technical matters that require analytic expertise; you can’t just hire off the street because the blending of data requires a deep understanding of the measurement steps of the data you’re blending, in addition to all the statistical issues. When small agencies fall below the minimum size of their technical core, they are threatened” (“State of the Science and Engineering Data Infrastructure: National Center for Science and Engineering Statistics”; magazine.amstat.org/blog/2021/07/01/state-of-infrastructure-ncses).
The good news is that the Biden administration’s FY 2022 budget largely breaks the trend of declining budgets for statistical agencies. The Biden budget is proposing that the National Institutes of Health (NIH) and the National Science Foundation receive a 20% increase, and seven other agencies will hopefully see their budgets increase between 5% and 10%. The remaining agencies’ budgets will remain flat (“Biden Administration’s First Budget Request Favors NIH, NSF; Flat Funds Education, Energy, Health, Justice, Statistics”; magazine.amstat.org/blog/2021/08/01/fy22-budget).
Data Collection and Use
The U.S. statistical agencies collect statistics primarily through surveys and censuses. The Census Bureau conducts more than 130 economic and demographic surveys every year. The surveys are sent to individuals, households, farms, businesses, governments, schools, and others to help under stand the state of the nation. The surveys cover many subjects, including the economy, agriculture, healthcare, crime stats, transportation, defense, education, energy, housing, social welfare, and other areas of public policy. These statistics are vital for decision making, policymaking, congressional realignment, and the distribution of federal funds.
But all is not well in survey land. There are two huge problems with government surveys: rising survey costs and increasing non-response rates. Surveys are expensive to conduct. How much does it cost to count the U.S. population? The Government Accountability Office (GAO) recently reported that the decennial census of 2020 “cost roughly $14.2 billion …, which is above initial estimates but below the Bureau’s most recent estimate of $15.6 billion.” In comparison, the 1990 census cost $2.6 billion, which represents a 450% increase in census costs from 1990 to 2020. Furthermore, since 1960, the decennial census has seen costs increase from $10 per household to $96 per household in 2020, which is an 860% increase. Kudos to the Census Bureau for reversing higher decennial costs, but agency budgets have done the exact opposite. Declining budgets pre vent the agencies from investing in new technologies, new methodologies, and new data sources to produce timely, accurate, reliable, and credible statistical products (GAO, “2020 CENSUS: Innovations Helped With Implementation, But Bureau Can Do More to Realize Future Benefits”; gao.gov/assets/gao-21-478.pdf; Modernizing the U.S. Census, The National Academies Press, 1995; doi.org/10.17226/4805).
Declining Response Rates
Within the last 20–30 years, agencies have seen reduced survey response rates. “Modernizing the U.S. Federal Data System: Insights From Nancy Potok, Chief Statistician, Office of Management and Budget” addresses this concern. According to Potok, who is no longer serving in the role of chief statistician, “people increasingly don’t like to answer surveys. It’s an intrusion. It’s hard to collect information that way” (IBM Center for the Business of Government; www.businessofgovernment.org/sites/default/files/Modernizing%20the%20U.S.%20Federal%20Data%20System.pdf). In the past 5 years, the Advance Monthly Sales for Retail and Food saw responses de cline 20 percentage points, while the Monthly Retail Trade dropped 10 percentage points during this same time period. The pandemic only exacerbated this problem (“How Does the Pandemic Affect Survey Response: Using Administrative Data to Evaluate Nonresponse in the Current Population Survey Annual Social and Economic Supplement”; census.gov/newsroom/blogs/research-matters/2020/09/pandemic-affect-survey-response.html).
Fewer responses force agencies to spend more money to ensure the surveys’ reliability and credibility by “[c]hasing down non-responders, a process called non-response follow-up, [which] is expensive: a mailed response costs $0.60 …, while dispatching an enumerator costs $67, a hundred times more” (The Royal Statistical Society, “Will Administrative Data Save Government Surveys?” Significance Magazine, October 2019; rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2019.01319.x). Ouch! Declining response rates cost more and threaten the credibility of the survey. Not good!
Another growing problem is item non-response, making the data quality questionable. Survey respondents, for what ever reason, decide not to provide a response to a specific question. Ron Jarmin, acting director of the Census Bureau, has addressed the problem of item non-response for the 2020 decennial census this way: “while people were count ed, some people left one or more questions blank, even if they completed most of the census questionnaire.” He goes on to note, “These blank responses left holes in the data which we had to fill” (“Redistricting Data: What to Expect and When”; census.gov/newsroom/blogs/director/2021/07/redistricting-data.html).
Data sharing among statistical agencies has gained more support by both government agencies and the data users who stand to benefit from new, timely, and granular data resources. Data sharing breaks down data silos, leverages data as a strategic asset, cuts costs, and delivers timely and vital statistical results. Potok believes that data sharing between federal agencies is a critical priority because it repurposes a lot of data that is already collected by the federal government (also known as administrative data), encourages collaboration across agencies, and utilizes wide-ranging talents and expertise from different agencies to tackle “big questions” that have been difficult for agencies to undertake previously.
The previously cited “Modernizing the U.S. Federal Data System” article suggests that enthusiasm is infectious; businesses, researchers, academics, and public policy groups want the government to use administrative data to deliver more statistics that are timely, granular, and accessible.
The pandemic has shown the urgent need for statistical series to be published as soon as possible. In May 2020, the Census Bureau, working with five other statistical agencies, launched the Household Pulse Survey (census.gov/programs-surveys/household-pulse-survey/data.html), a weekly survey that provides a timely reflection of how Americans are doing during the pandemic. The Bureau of Transportation Statistics started publishing daily and weekly transportation statistics in early September 2020 to show the impact COVID-19 has had on mobility and daily and weekly trips across the country (www.bts.gov/covid-19). The launch of the Household Pulse Survey and the daily/weekly transportation statistics demonstrates that agencies can work together; share data, resources, and expertise; and produce valuable statistics at a time of national emergency. Linking data between agencies enables under standing and enhances decision making that benefits communities around the country.
However, there are some challenges to overcome, including federal laws and regulations that limit who has access to the data and how the data can be used. The Census Bureau, for example, has access to federal tax data from the IRS to create the Census Business Register. However, the BLS does not currently have access to IRS data, so it must rely on a different source. Thus, it is much harder to compare statistics between the two agencies because they are using different datasets. The National Research Council concludes that “being able to use the same business list and synchronize the existing lists would both reduce the burden on businesses and improve the quality of economic statistics, and it is likely that it would also result in cost savings.” Even more frustrating, the Census Bureau and BLS “have had explicit legal authority to allow them to share business information for statistical purposes since 2002 (PL 107-347 Title V, Subtitle B). The required change to the IRS legislation that would permit BLS to have access to limited business tax information has not been passed, despite numerous efforts” (“Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy,” June 5, 2017; nap.edu/read/24652/chapter/5#41).
Data sharing between the federal government and state governments has the potential to really leverage valuable data at the state and local levels—something businesses and local governments really want and need. However, even though federal statistical agencies frequently request access to administrative data from states and local authorities, the states are under no legal obligation to comply with these requests; when they do, there can be many hurdles to clear.
Here’s an example of the problem: To create the Longitudinal Employer Household Dynamics (LEHD) program, the Census Bureau wanted to combine state administrative data on business establishments with household and business survey data. To make this happen, it “had to negotiate separately with every state to obtain its administrative data, the process initially took more than 10 years, requires annual renewals, and the Census Bureau was not allowed to use the data for anything other than the LEHD.” Overcoming these obstacles can be very challenging but “it would be valuable to investigate and implement strategies to combine information from survey and nonsurvey data sources to improve efficiency and meet the ever-growing need for more information” (“Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps”; doi.org/10.17226/24893).
Privacy and Confidentiality
The U.S. statistical system has the huge responsibility of protecting the confidentiality and privacy of the data that it collects. But the data landscape is a moving target. The statistical agencies must deal with new technologies (AI/machine learning); new commercial and public data sources; massive computing power; and the threat of hackers and others who can break into government and commercial databases, steal confidential data, and then use it to blackmail companies, governments, and individuals. A big challenge in and of itself, this is made vastly more difficult by declining budgets and staff.
To deal with this problem, the Census Bureau has used various techniques to prevent respondents’ data from being discovered. For the 2020 census, it introduced “differential privacy.” “Differential Privacy plugs the leaks using mathematical principles, applying carefully calibrated statistical noise to a dataset. It allows us to strike a balance between privacy and accuracy in a surgical way” (“Modernizing Privacy Protections for the 2020 Census: Next Steps”; census.gov/newsroom/blogs/random-samplings/2021/04/modernizing_privacy.html). Yes, to protect privacy, the Census Bu reau has introduced noise, which makes smaller areas like census blocks look “fuzzy,” so “the data for a particular block may not seem correct.” In the “Redistricting Data” report cit ed earlier, acting director Jarmin posits that “noise in the block-level data will require a shift in how some data users typically approach using these census data.
Not surprisingly, researchers who rely on block-level data are not enthusiastic about the introduction of noise and fuzziness into the Census data. The Census Bureau defended its decision by arguing that it had two options—continue to use legacy techniques that would add a lot of statistical noise and would drastically reduce the amount of 2020 census statistics that could be released, or rely on differential privacy. It chose differential privacy. The issue of privacy versus granular data is not going to go away, but with continued financial support, the Census Bureau will offer the best solutions it can to deliver the most accurate, timely data while protecting the privacy and confidentiality of all U.S. citizens.
FUTURIZING THE STATISTICAL SYSTEM
The federal statistical system needs to be updated and modernized to deal with 21st-century realities and needs. Leaders from the American Statistical Association and the Council of Professional Associations on Federal Statistics (COPAFS) argue that agencies need to modernize by restoring purchasing power and having a budget that reflects the need to test and research new tools, technologies, new data security techniques, and alternative data sources.
Next, add staff that have the requisite skills and expertise to handle the new technologies and tools. Create a workable infrastructure that allows and encourages the sharing of administrative data and data from third parties between federal and state governments. Support research and development to investigate and research new methods to safeguard and protect the data collected.
And, lastly, Congress and the president must ensure that the agencies have freedom from political interference. The entire statistical system is based on a three-way trust—trust that the data will not be manipulated by political agendas, that respondents will answer survey questions accurately, and that respondents’ answers will be protected and kept private and confidential. If these steps are taken, an upgraded statistical infrastructure will benefit the nation for years and years to come.