On April 9, 2017, United Airlines flight 3411 was preparing to take off from Chicago when flight attendants discovered the plane was overbooked. They tried to get volunteers to give up their seats with promises of travel vouchers and hotel accommodations, but not enough people were willing to get off the flight.
So United ended up calling some airport security officers, who boarded the plane and forcibly removed a passenger named Dr. David Dao. The officers ripped Dao out of his seat and carried him down the aisle of the airplane, nose bleeding, while horrified onlookers captured the scene with their phones. The public was outraged.
But how did Dr. Dao end up being the unlucky passenger that United decided to remove? Immediately following the incident, there was speculation that racial discrimination played a part — and it’s possible it played a role in how he was treated. But the answer to how he was chosen is actually an algorithm, a computer program that crunched through reams of data, looking at how much each passenger had paid for their ticket, what time they checked in, how often they flew on United, and whether they were part of a rewards program. The algorithm likely determined that Dr. Dao was one of the least valuable customers on the flight at the time.
Computer algorithms now shape our world in profound and mostly invisible ways. They predict if we’ll be valuable customers and whether we’re likely to repay a loan. They filter what we see on social media, sort through resumes, and evaluate job performance. They inform prison sentences and monitor our health. Most of these algorithms have been created with good intentions. The goal is to replace subjective judgments with objective measurements. But it doesn’t always work out like that.
“I don’t think mathematical models are inherently evil — I think it’s the ways they’re used that are evil,” says mathematician Cathy O’Neil, author of the book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. She has studied number theory, worked as a data scientist at start-ups, and built predictive algorithms for various private enterprises. Through her work, she’s become critical about the influence of poorly-designed algorithms.
An algorithm, in a nutshell, is a step-by-step guide to solving a problem. It’s a set of instructions, like a recipe. Computer algorithms are sets of rules for calculations that take historical data and predict future successful outcomes. And many companies that build and market these algorithms like to talk about how objective they are, claiming they remove human error and bias from complex decision-making.
But in reality, every algorithm reflects the choices of its human designer. O’Neil has a metaphor to help explain how this works. She gives the example of cooking dinner for her family. The ingredients in her kitchen are the “data” she has to work with, “but to be completely honest I curate that data because I don’t really use [certain ingredients] … therefore imposing my agenda on this algorithm. And then I’m also defining success, right? I’m in charge of success. I define success to be if my kids eat vegetables at that meal …. My eight year old would define success to be like whether he got to eat Nutella.”
Of course, the fact that algorithms reflect the subjective choices of their designers doesn’t necessarily make them bad. However, O’Neil does single out a particular kind of algorithm for scrutiny, a subset she refers to as “Weapons of Math Destruction” (or: WMDs). These have three properties: (1) they are widespread and important, (2) they are mysterious in their scoring mechanism, and (3) they are destructive.
One kind of WMD that O’Neil explores in her book are “recidivism risk algorithms,” which are supposed to assess how likely it is that a person will break the law again. Some judges use these risk scores to determine amount of bail, length of sentence, and likelihood of parole.
The algorithms were built with a positive goal in mind — they were supposed to add some objectivity to a process that can be very subjective and prone to human bias. “These recidivism scores were actually originally introduced to cut down on racism by the judges,” says O’Neil. The ACLU has found that sentences imposed on black men in the federal system are nearly 20 percent longer than those for white men convicted of similar crimes. Other studies have shown prosecutors are more likely to seek the death penalty for African-Americans than for whites convicted of the same charges. So you might think that computerized models fed by data would contribute to more even-handed treatment. And increasingly the criminal justice system has turned to “risk assessment algorithms” to do just that.
Most recidivism algorithms look at a few types of data — including a person’s record of arrests and convictions and their responses to a questionnaire — then they generate a score. But the questions, about things like whether one grew up in a high-crime neighborhood or have a family member in prison, are in many cases “basically proxies for race and class,” explains O’Neil. The score generated by the algorithm is used by judges when making decisions about the defendant. People with higher scores will often face higher bail, longer sentences, and lower chances of parole. Instead, O’Neil believes these results could be used to select people for rehabilitation programs or to better understand society’s structural inequalities.
Well-designed algorithms can result in positive reforms within the criminal justice system. For example, the state of New Jersey recently did away with their cash bail system, which disadvantaged low-income defendants. The state now relies on predictive algorithms instead — ones carefully designed to try and eliminate racial bias. Data shows the state’s pre-trial county jail populations are down by about 20 percent.
But still, algorithms like that one remain unaudited and unregulated, and it’s a problem when algorithms are basically black boxes. In many cases, they’re designed by private companies who sell them to other companies. The exact details of how they work are kept secret.
O’Neil also sees a more fundamental issue at work: people tend to trust results that look scientific, like algorithmic risk scores. “I call that the weaponization of an algorithm … an abuse of mathematics,” she says, “and it makes it almost impossible to appeal these systems.” And this, in turn, provides a convenient way for people to avoid difficult decision-making, deferring to “mathematical” results.
In her book, for instance, O’Neil cites the example of a man named Kyle Behm who took some time off from college for mental health treatment. After getting treatment, he applied for a part-time job at a large supermarket chain. In the process, he took a personality test, which is not uncommon for applicants to large companies. Behm did not receive an interview.
In most similar cases, the applicant wouldn’t know why they were rejected, but Behm happened to have a friend who worked at the supermarket who told him the test results were a deciding factor. Behm told his father, a lawyer familiar with the Americans with Disabilities Act, who ended up filing a class action lawsuit against the company.
The type of test Behm took was a lot like a common one used in mental health testing. It generates something called an OCEAN score, an acronym referring to five personality traits: Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism.
Again, it becomes a question of how these scores are used. For certain jobs, some businesses can petition regulators for exceptions that will allow them to legally use such scores. “And then the regulatory body can decide whether it’s a valid reason,” explains O’Neil. Often, though, “companies just sell the same personality test to all the businesses that will buy them” and those businesses don’t bother to determine whether their usage is legal, fair, or even useful.
So how should we go about addressing the problem of poorly-designed algorithms? O’Neil says the solution is transparency and measurement. She says researchers must examine cases where algorithms fail, paying special attention to the people they fail and what demographics are most negatively affected by them.