Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

At the federal level, this problem could be greatly alleviated by abolishing the Electoral College system. It's the winner-take-all mathematics from state to state that delivers so much power to a relative handful of voters. It's as if in politics, as in economics, we have a privileged 1 percent. And the money from the financial 1 percent underwrites the microtargeting to secure the votes of the political 1 percent. Without the Electoral College, by contrast, every vote would be worth exactly the same. That would be a step toward democracy.

Baseball also has statistical rigor. Its gurus have an immense data set at hand, almost all of it directly related to the performance of players in the game. Moreover, their data is highly relevant to the outcomes they are trying to predict. This may sound obvious, but as we’ll see throughout this book, the folks building WMDs routinely lack data for the behaviors they’re most interested in. So they substitute stand-in data, or proxies. They draw statistical correlations between a person’s zip code or language patterns and her potential to pay back a loan or handle a job. These correlations are discriminatory, and some of them are illegal.

Big Data processes codify the past. They do not invent the future.

Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.

Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit. In

By the end of the meeting, one conscientious human being had cleared up the confusion generated by web-crawling data-gathering programs. The housing authority knew which Catherine Taylor it was dealing with. The question we’re left with is this: How many Wanda Taylors are out there clearing up false identities and other errors in our data? The answer: not nearly enough. Humans in the data economy are outliers and throwbacks.

Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.

However, when you create a model from proxies, it is far simpler for people to game it. This is because proxies are easier to manipulate than the complicated reality they represent.

If you look at this development from the perspective of a university president, it’s actually quite sad. Most of these people no doubt cherished their own college experience—that’s part of what motivated them to climb the academic ladder. Yet here they were at the summit of their careers dedicating enormous energy toward boosting performance in fifteen areas defined by a group of journalists at a second-tier newsmagazine. They were almost like students again, angling for good grades from a taskmaster. In fact, they were trapped by a rigid model, a WMD.

I have no reason to believe that the social scientists at Facebook are actively gaming the political system. Most of them are serious academics carrying out research on a platform that they could only have dreamed about two decades ago. But what they have demonstrated is Facebook’s enormous power to affect what we learn, how we feel, and whether we vote. Its platform is massive, powerful, and opaque. The algorithms are hidden from us, and we see only the results of the experiments researchers choose to publish.

In a system in which cheating is the norm, following the rules amounts to a handicap.

In this march through a virtual lifetime, we’ve visited school and college, the courts and the workplace, even the voting booth. Along the way, we’ve witnessed the destruction caused by WMDs. Promising efficiency and fairness, they distort higher education, drive up debt, spur mass incarceration, pummel the poor at nearly every juncture, and undermine democracy. It might seem like the logical response is to disarm these weapons, one by one. The problem is that they’re feeding on each other. Poor people are more likely to have bad credit and live in high-crime neighborhoods, surrounded by other poor people. Once the dark universe of WMDs digests that data, it showers them with predatory ads for subprime loans or for-profit schools. It sends more police to arrest them, and when they’re convicted it sentences them to longer terms. This data feeds into other WMDs, which score the same people as high risks or easy targets and proceed to block them from jobs, while jacking up their rates for mortgages, car loans, and every kind of insurance imaginable. This drives their credit rating down further, creating nothing less than a death spiral of modeling. Being poor in a world of WMDs is getting more and more dangerous and expensive.

I was forced to confront the ugly truth: people had deliberately wielded formulas to impress rather than clarify.

Justice cannot just be something that one part of society inflicts upon the other.

Just imagine if police enforced their zero-tolerance strategy in finance. They would arrest people for even the slightest infraction, whether it was chiseling investors on 401ks, providing misleading guidance, or committing petty frauds. Perhaps SWAT teams would descend on Greenwich, Connecticut. They’d go undercover in the taverns around Chicago’s Mercantile Exchange.

Opaque and invisible models are the rule, and clear ones very much the exception. We’re modeled as shoppers and couch potatoes, as patients and loan applicants, and very little of this do we see—even in applications we happily sign up for. Even when such models behave themselves, opacity can lead to a feeling of unfairness.

Racism, at the individual level, can be seen as a predictive model whirring away in billions of human minds around the world. It is built from faulty, incomplete, or generalized data. Whether it comes from experience or hearsay, the data indicates that certain types of people have behaved badly. That generates a binary prediction that all people of that race will behave that same way. Needless to say, racists don’t spend a lot of time hunting down reliable data to train their twisted models. And once their model morphs into a belief, it becomes hardwired. It generates poisonous assumptions, yet rarely tests them, settling instead for data that seems to confirm and fortify them. Consequently, racism is the most slovenly of predictive models. It is powered by haphazard data gathering and spurious correlations, reinforced by institutional inequities, and polluted by confirmation bias.

Simpson’s Paradox: when a whole body of data displays one trend, yet when broken into subgroups, the opposite trend comes into view for each of those subgroups.

Someone who takes the trouble to see her file at one of the many brokerages, for example, might see the home mortgage, a Verizon bill, and a $ 459 repair on the garage door. But she won’t see that she’s in a bucket of people designated as “Rural and Barely Making It,”or perhaps “Retiring on Empty.

Some two thousand stone-throwing protesters gathered in the street outside the school. They chanted, "We want fairness. There is no fairness if you don't let us cheat." It sounds like a joke, but they were absolutely serious.

Thanks in part to the resulting high score on the evaluation, he gets a longer sentence, locking him away for more years in a prison where he’s surrounded by fellow criminals—which raises the likelihood that he’ll return to prison. He is finally released into the same poor neighborhood, this time with a criminal record, which makes it that much harder to find a job. If he commits another crime, the recidivism model can claim another success. But in fact the model itself contributes to a toxic cycle and helps to sustain it. That’s a signature quality of a WMD.

The government regulates them, or chooses not to, approves or blocks their mergers and acquisitions, and sets their tax policies (often turning a blind eye to the billions parked in offshore tax havens). This is why tech companies, like the rest of corporate America, inundate Washington with lobbyists and quietly pour hundreds of millions of dollars in contributions into the political system. Now they’re gaining the wherewithal to fine-tune our political behavior—and with it the shape of American government—just by tweaking their algorithms.

The math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of these models encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models were opaque, their workings invisible to all but the highest priests in their domain: mathematicians and computer scientists. Their verdicts, even when wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer.

The result is that we criminalize poverty, believing all the while that our tools are not only scientific but fair.

these models are constructed not just from data but from the choices we make about which data to pay attention to—and which to leave out. Those choices are not just about logistics, profits, and efficiency. They are fundamentally moral. If we back away from them and treat mathematical models as a neutral and inevitable force, like the weather or the tides, we abdicate our responsibility. And the result, as we’ve seen, is WMDs that treat us like machine parts in the workplace, that blackball employees and feast on inequities. We must come together to police these WMDs, to tame and disarm them. My hope is that they’ll be remembered, like the deadly coal mines of a century ago, as relics of the early days of this new revolution, before we learned how to bring fairness and accountability to the age of data. Math deserves much better than WMDs, and democracy does too.

This creates a pernicious feedback loop. The policing itself spawns new data, which justifies more policing. And our prisons fill up with hundreds of thousands of people found guilty of victimless crimes. Most of them come from impoverished neighborhoods, and most are black or Hispanic. So even if a model is color blind, the result of it is anything but. In our largely segregated cities, geography is a highly effective proxy for race.

This is a point I’ll be returning to in future chapters: we’ve seen time and again that mathematical models can sift through data to locate people who are likely to face great challenges, whether from crime, poverty, or education. It’s up to society whether to use that intelligence to reject and punish them—or to reach out to them with the resources they need.

This is unjust. The questionnaire includes circumstances of a criminal’s birth and upbringing, including his or her family, neighborhood, and friends. These details should not be relevant to a criminal case or to the sentencing. Indeed, if a prosecutor attempted to tar a defendant by mentioning his brother’s criminal record or the high crime rate in his neighborhood, a decent defense attorney would roar, “Objection, Your Honor!” And a serious judge would sustain it. This is the basis of our legal system. We are judged by what we do, not by who we are. And although we don’t know the exact weights that are attached to these parts of the test, any weight above zero is unreasonable.

To create a model, then, we make choices about what’s important enough to include, simplifying the world into a toy version that can be easily understood and from which we can infer important facts and actions. We expect it to handle only one job and accept that it will occasionally act like a clueless machine, one with enormous blind spots.

What’s different here is the focus on the proxy when far more relevant data is available. I cannot imagine a more meaningful piece of data for auto insurers than a drunk driving record. It is evidence of risk in precisely the domain they’re attempting to predict. It’s far better than other proxies they consider, such as a high school student’s grade point average. Yet it can count far less in their formula than a score drawn from financial data thrown together on a credit report (which, as we’ve seen, is sometimes erroneous).