The new Census algorithm is wiping out cities’ populations and miscounting minorities: experts

The new Census algorithm is wiping out cities’ populations and miscounting minorities: experts
Royalty-free stock photo ID: 1016244052 Multicultural young people using laptops and smartphones sitting in row, diverse african and caucasian millennials entertaining online obsessed with modern devices waiting in queue, gadget addiction.

The Census Bureau is rolling out a new algorithm intended to protect respondents' privacy — but experts warn the change will significantly miscount minority communities and rural areas.


Specifically, the Census Bureau plans to use a new "differential privacy" algorithm to obscure respondents' identities, yet state experts warn that the data could result in population errors of 25 percent or more and misrepresent certain groups by 100 percent or more. This would have dramatic results on redistricting and funding.

The data released by the bureau is expected to be accurate on the state level but its sub-state level data — region, county, city, town — will be intentionally distorted. In the past, the bureau used "data swapping" to ensure individuals in small populations were not identifiable by certain statistics by aggregating their data with similar individuals while keeping the population totals accurate, according to the National Council of State Legislatures (NCSL). But concerns that the data could be cross-referenced with other information that could make individuals identifiable led the bureau to implement a "differential privacy" algorithm that will "inject noise" into the raw data.

Though the bureau is still working out how it will implement this, the move immediately raised concerns.

"Differential privacy will mean that, except at the state level, population and voting age population will not be reported as enumerated. And, race and ethnicity data are likely to be farther from the 'as enumerated' data than in past decades, when data swapping was used to protect small populations," according to the NCSL. "This may raise issues for racial block voting analyses."

The bureau released a demonstration to states to test out the new method using data from the 2010 census and experts quickly realized that the data was very different from the original 2010 numbers, particularly in rural areas.

Meredith Strohm Gunter of the Weldon Cooper Center for Public Service at the University of Virginia warned Gov. Ralph Northam in a January memo that data on the sub-state level "will be sacrificed" for privacy, which could lead to "misallocation of funds, poor capacity for planning… and a competitive disadvantage in economic and workforce development."

Gunter told Northam that the demonstration provided by the bureau spit out inaccurate and likely impossible data.

"For example… we found the total number of girls ages 15-19 in the City of Emporia were decreased from the actual 185 to only 30," she wrote. "Applying this number to the teen pregnancy rate for Emporia increased the rate from 10 percent to 66 percent. This is not only ludicrous, but, if consistent across localities and subject areas, deeply damaging to the ability of state and local governments and non-profits to accurately address the needs of Virginians."

Other errors were similarly egregious. For example, the demonstration showed 716 people living on the Hawaiian island of Molokai when the actual population in 2010 was just 90 people, according to an op-ed by The New York Times' Gus Wezerek and University of Minnesota data scientist David Van Riper. The population of small Native American reservations with fewer than 5,000 residents saw their populations decline by an average of 34% in the demonstration. Small Alaskan villages saw population declines even though they continued to grow.

An analysis by Utah officials saw 15,000 actual residents disappear from the count. Two cities lost more than 50% of their populations, 20 cities lost 20% of their populations, and 43 cities lost 10% of their populations, Utah House Speaker Brad Wilson and Senate President Stuart Adams said in a letter to Steven Dillingham, the director of the Census Bureau. Another city, on the other hand, saw a 253% population increase.

"Not only will this alter basic demographic information in both rural and urban areas of the state, but it may also adversely affect longitudinal studies about health, safety, and welfare," they wrote.

An analysis by Washington state's Office of Financial Management said the demonstration's household data for eight counties showed occupancy rates "at or near 100%, which is illogical and historically implausible."

"There is bias in the demonstration data that causes areas with small populations to get larger while areas with larger populations get smaller," state demographer Mark Mohrman said in a letter to Dillingham. The data also deviated when it came to racial demographics, he said.

Along with miscounts, these errors could also completely misrepresent entire communities.

"A rural, declining, old, predominant white community, for example, may appear instead growing, younger, and more diverse," Gunter wrote. As a result, redistricting data "will be inaccurate" and "majority-minority districts could lose their status," she warned. The data will also result in potential loss of funding for communities, which would affect housing, transportation, emergency management, and numerous other services.

Qian Cai, the director of the Weldon Cooper Center's Demographics Research Group, warned in an op-ed at the Richmond Times-Dispatch that while the move is "well-intended" the bureau "believes data distortion prevents reconstruction of individual records including age, gender, race and homeownership, even though that basic information already is easily accessible through the internet.

The "consequences are disastrous," she said.

"We no longer will have accurate information about our communities. The data distortion might misrepresent a city's population size by 25% or more, or in the case of an age group… by more than 100%," she wrote. As a result, data necessary for things like enforcing voting rights, funding schools, planning for emergencies, tracking opioid addiction, and city planning will be inaccurate and meaningless.

Officials in Maine also expressed concern after seeing the demonstration data to Dillingham in a letter last month.

"Our analyses show that small, rural places suffer the most in terms of inaccurate estimates. In Maine's case, that means a majority of our counties and sub-county geographies are subject to unacceptably high levels of error… The repercussions for our state and nation are considerable," wrote Angela Hallowell, the state's data center lead, and Maine State Economist Amanda Rector. The proposal, they said, would "throw into doubt any redistricting, funding decisions, or analysis done using census data."

Maine's analysis found that the Census Bureau's demonstration data on certain age and gender groups had error rates of more than 100%, which the letter warned would leave areas "vulnerable to large miscounts." And while data on white populations was largely accurate, minority populations had error percentage rates of more than 25% and even more when looking only at black populations.

In Maine's Franklin County, for example, "the count of households with a black… householder was more than 11 times" higher in the demonstration than in the original data.

"This will have myriad financial and economic repercussions for the 'winners' and 'losers' that municipalities will randomly become," the letter said.

John Abowd, the associate director for research and methodology and chief scientist of the US Census Bureau, said in a letter to officials in Nevada that the algorithm was "written specifically for the 2020 Census and cannot be directly applied to any other data."

"The Census Bureau is committed to publishing accurate data for the 2020 Census, however our obligations to protect privacy mean that we cannot publish perfectly accurate data for every conceivable use case," Abowd wrote. He argued that the bureau expects the "impact of the error introduced by the use of formal privacy will be less than the error resulting from other factors."

"We know of no other statistical technique that can be reliably employed to assure the confidentiality of the underlying data while simultaneously assuring the highest quality statistical product for our data users," he wrote.

Abowd said that as the bureau works to improve the algorithm, "we are also researching a variety of contingency plans to ensure that the 2020 Census Data Products meet the Census Bureau's data quality standards."

"Because of the impact of differential privacy on data accuracy for small geographies or populations, however, the Census Bureau is evaluating what tables to release and at what geographic levels to ensure that our data products meet fitness-for-use standards," he added. "More generally, the Census Bureau is eager to engage with federal, state and local programs to learn more of how they use census data and their requirements for accuracy. The Census Bureau is also eager to engage with stakeholders to understand the privacy expectations, requirements, and concerns of the American public."

But state officials worry that even minor errors could result in significant long-term consequences.

"Inaccuracy in the decennial census will flow through ten full years of data," Hallowell and Rector warned Dillingham. "The current implementation of [differential privacy] creates a group of regions and people, predominantly rural and already marginalized, that are left behind; they will continue to be left behind for the remainder of the decade unless action is taken to improve the algorithm. Without resolution… it will be impossible to measure the magnitude of these errors, resulting in further challenges for these places and communities."

Enjoy this piece?

… then let us make a small request. AlterNet’s journalists work tirelessly to counter the traditional corporate media narrative. We’re here seven days a week, 365 days a year. And we’re proud to say that we’ve been bringing you the real, unfiltered news for 20 years—longer than any other progressive news site on the Internet.

It’s through the generosity of our supporters that we’re able to share with you all the underreported news you need to know. Independent journalism is increasingly imperiled; ads alone can’t pay our bills. AlterNet counts on readers like you to support our coverage. Did you enjoy content from David Cay Johnston, Common Dreams, Raw Story and Robert Reich? Opinion from Salon and Jim Hightower? Analysis by The Conversation? Then join the hundreds of readers who have supported AlterNet this year.

Every reader contribution, whatever the amount, makes a tremendous difference. Help ensure AlterNet remains independent long into the future. Support progressive journalism with a one-time contribution to AlterNet, or click here to become a subscriber. Thank you. Click here to donate by check.

DonateDonate by credit card

Close

Thanks for your support!

Did you enjoy AlterNet this year? Join us! We're offering AlterNet ad-free for 15% off - just $2 per week. From now until March 15th.