How Using Big Data to Understand Social Problems Can Create More Inequality
In her catchily titled book, Weapons of Math Destruction, Cathy O’Neil, a number theorist turned data scientist, delivers a simple but important message: Statistical models are everywhere, and they exert increasing power over many aspects of our daily lives. Data collected by occult means and analyzed by algorithms of often dubious validity help to determine who gets a mortgage, who goes to college, what you pay for insurance, who gets what job, what level of scrutiny you will be subjected to when you fly, how aggressively your neighborhood will be policed, and how you will be treated if arrested.
No one will be surprised to learn that ever more powerful computers processing rapidly expanding volumes of data have become a ubiquitous tool of decision-makers in many areas. O’Neil drives the point home by assembling numerous examples ranging from college ranking systems to payday loan-sharking and hedge-fund trading. Perversely, many of the algorithms used in these analyses privilege the already-privileged and handicap the already-handicapped: If you’ve been treated for depression, you’re less likely to find work and therefore more likely to relapse; if you’re a “borrower from the rough section of East Oakland,” you’ll pay higher interest on your credit card even though you’re “already struggling.” Whether intentionally or unintentionally, the automation of selection processes thus increases inequality.
Although the author is herself a working data scientist, she takes great pains to avoid technical jargon and to put her argument in the simplest terms possible. To explain the intuition that underlies the application of statistics to the real world, she begins with baseball, the locus classicus for all American stats mavens. Anyone can understand why fielders will shift their positions toward right field when facing a slugger known to hit that way more often than not.
Not all statistical models are so easy to grasp, however. O’Neil distinguishes between healthy models and unhealthy ones. Healthy models are transparent. The cogs and wheels that make them function are exposed for all to see, understand, and evaluate. They are continuously updated as new data flow in. If the model makes inaccurate predictions, it can be corrected and tested against still newer data to see if it improves. If it cannot be improved—if its predictions remain erratic—then it should be scrapped, since the inability to improve the model suggests inadequate understanding of the underlying process. Still, no one should mistake the existence of a sophisticated algorithm for proof that it is valid, trustworthy, or harmless.
O’Neil’s book can be read as a plea to her fellow data scientists to take a Hippocratic oath for the age of big data: Above all, a good algorithm should do no harm. She identifies any number of cases in which that oath has been violated. One that comes in for special attention is an algorithm designed to measure the “value added” by individual teachers in the classroom. A middle school English teacher named Tim Clifford was devastated to learn that he had received an “abysmal 6 out of 100” in a value-added evaluation. When he tried to determine where he had gone wrong, however, he was unable to identify any specific flaw in his teaching. The following year he therefore taught a similar class without changing his approach at all. His score shot up to 96. The experience made him “realize how ridiculous the entire value-added model is when it comes to education.” Ill-conceived performance measures not only shame and penalize individuals but also result in bad public policy when administrators place undue confidence in seemingly objective measurements to the detriment of possibly more informative modes of evaluation.
Sometimes, the harm done by algorithms is inflicted not on individuals but on entire groups. For instance, one education consulting firm helps colleges “target the most promising candidates for recruitment” on the basis of ability to pay full tuition or eligibility for outside scholarships. A company in Virginia supplies software to sift through call center traffic to shorten the waiting time of those deemed to be “more profitable prospects.” Distilled from such examples, the heart of O’Neil’s argument is simply stated: The incessant drive to cut costs and increase profits has discriminatory consequences. Algorithms are touted as antidotes to prejudice, or subjective bias, but in many cases they simply replace subjective bias with what can only be called objective bias: Even if there is no intent to treat people unequally, inequality is built into the machinery of choice.
The evaluation of applicants for auto insurance offers an interesting example. In 2015, Consumer Reports researchers looked into the pricing of auto insurance policies, analyzing some two billion price quotes from around the country. What they discovered was that insurers rated applicants not only on the basis of their driving records but also on information gleaned from credit reports. This so-called “proxy data” weighed more heavily than a driver’s actual safety history: “In New York State, for example, a dip in a driver’s credit rating from ‘excellent’ to merely ‘good’ could jack up the annual cost of insurance by $255.”
Since poor people tend to have worse credit ratings than those better off, the application of this algorithm was inherently discriminatory. The investigators found that because of reliance on proxy data such as credit scores and high school grades, the pricing of auto insurance was “wildly unfair.” But why would insurers choose to weigh such proxy data more heavily than actual driving records in their algorithm?
O’Neil’s answer reveals both the power and the weakness of her approach. “Automatic systems,” she writes, “can plow through credit scores with great efficiency and at enormous scale.” This is certainly true, but such systems can also plow through driving records. Why prefer credit scores? Is electronic access easier to obtain than access to driving records? This may well be the case, but the book provides no evidence of it. Instead, the author abruptly shifts gears to attack the motives of the insurance companies rather than the algorithm they use, which is her ostensible subject: “I would argue that the chief reason has to do with profits. If an insurer has a system that can pull in an extra $1,552 a year from a driver with a clean record, why change it? The victims of their WMD … are more likely to be poor and less educated, a good number of them immigrants. They’re less likely to know that they’re being ripped off. And in neighborhoods with more payday loan offices than insurance brokers, it’s harder to shop for lower rates. In short, while an e-score might not correlate with safe driving, it does create a lucrative pool of vulnerable drivers. Many of them are desperate to drive—their jobs depend on it. Overcharging them is good for the bottom line.”
This is vivid writing. It is also highly tendentious. No justification for the $1,552 figure appears in the book, nor is the insurers’ side of the story aired at all. The Consumer Reports article cited is more informative. We learn that “car insurers didn’t use credit scores until the mid-1990s. That’s when several of them, working with the company that created the FICO score, started testing the theory that the scores might help to predict claim losses. They kept what they were doing hush-hush. By 2006, almost every insurer was using credit scores to set prices. But two-thirds of consumers surveyed by the Government Accountability Office at about the same time said they had no idea that their credit could affect what they paid for insurance.”
From this brief account we learn that insurers did indeed follow O’Neil’s prescription to test their models against real out-of-sample data. Did they do so in order to hone a strategy for ripping off the poor and vulnerable, as O’Neil suggests? Or was it that risk pools based on credit scores proved more accurate in predicting the likelihood of future accidents than risk pools defined by past driving records? The latter is a logical possibility that one would expect a mathematician like O’Neil to consider before indicting the motives of the insurers.
Of course, even if credit scores are a useful proxy for accident-proneness, we might conclude that using them to construct risk pools is inherently unfair. O’Neil’s remedy is to insist that the algorithms used in constructing statistical instruments be made transparent and subjected to scrutiny by stakeholders. This is a reasonable proposal, but it won’t help with another problem she identifies: the gaming of algorithms. For instance, Baylor University administrators used knowledge of how U.S. News and World Report computes its college rankings to improve the standing of their institution. Because the U.S. News algorithm was at least partially transparent, administrators “paid the fee for admitted students to retake the SAT” in order to “boost their scores—and Baylor’s ranking.” Hence transparency is no panacea.
Weapons of Math Destruction provides a handy map to a few of the many areas of our lives over which invisible algorithms have gained some control. As the empire of big data continues to expand, Cathy O’Neil’s reminder of the need for vigilance is welcome and necessary, despite the occasional breathlessness of her prose. Patience and rigor are not what one expects from a crier of alarm, a role for which O’Neil is particularly well-suited and in which she performs admirably.