The Age of the Algorithm

On April 9, 2017, United Airlines flight 3411 was preparing to take off from Chicago when flight attendants discovered the plane was overbooked. They tried to get volunteers to give up their seats with promises of travel vouchers and hotel accommodations, but not enough people were willing to get off the flight.

So United ended up calling some airport security officers, who boarded the plane and forcibly removed a passenger named Dr. David Dao. The officers ripped Dao out of his seat and carried him down the aisle of the airplane, nose bleeding, while horrified onlookers captured the scene with their phones. The public was outraged.

But how did Dr. Dao end up being the unlucky passenger that United decided to remove? Immediately following the incident, there was speculation that racial discrimination played a part — and it’s possible it played a role in how he was treated. But the answer to how he was chosen is actually an algorithm, a computer program that crunched through reams of data, looking at how much each passenger had paid for their ticket, what time they checked in, how often they flew on United, and whether they were part of a rewards program. The algorithm likely determined that Dr. Dao was one of the least valuable customers on the flight at the time.

Computer algorithms now shape our world in profound and mostly invisible ways. They predict if we’ll be valuable customers and whether we’re likely to repay a loan. They filter what we see on social media, sort through resumes, and evaluate job performance. They inform prison sentences and monitor our health. Most of these algorithms have been created with good intentions. The goal is to replace subjective judgments with objective measurements. But it doesn’t always work out like that.

“I don’t think mathematical models are inherently evil — I think it’s the ways they’re used that are evil,” says mathematician Cathy O’Neil, author of the book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. She has studied number theory, worked as a data scientist at start-ups, and built predictive algorithms for various private enterprises. Through her work, she’s become critical about the influence of poorly-designed algorithms.

An algorithm, in a nutshell, is a step-by-step guide to solving a problem. It’s a set of instructions, like a recipe. Computer algorithms are sets of rules for calculations that take historical data and predict future successful outcomes. And many companies that build and market these algorithms like to talk about how objective they are, claiming they remove human error and bias from complex decision-making.

But in reality, every algorithm reflects the choices of its human designer. O’Neil has a metaphor to help explain how this works. She gives the example of cooking dinner for her family. The ingredients in her kitchen are the “data” she has to work with, “but to be completely honest I curate that data because I don’t really use [certain ingredients] … therefore imposing my agenda on this algorithm. And then I’m also defining success, right? I’m in charge of success. I define success to be if my kids eat vegetables at that meal …. My eight year old would define success to be like whether he got to eat Nutella.”

Of course, the fact that algorithms reflect the subjective choices of their designers doesn’t necessarily make them bad. However, O’Neil does single out a particular kind of algorithm for scrutiny, a subset she refers to as “Weapons of Math Destruction” (or: WMDs). These have three properties: (1) they are widespread and important, (2) they are mysterious in their scoring mechanism, and (3) they are destructive.

One kind of WMD that O’Neil explores in her book are “recidivism risk algorithms,” which are supposed to assess how likely it is that a person will break the law again. Some judges use these risk scores to determine amount of bail, length of sentence, and likelihood of parole.

The algorithms were built with a positive goal in mind — they were supposed to add some objectivity to a process that can be very subjective and prone to human bias. “These recidivism scores were actually originally introduced to cut down on racism by the judges,” says O’Neil. The ACLU has found that sentences imposed on black men in the federal system are nearly 20 percent longer than those for white men convicted of similar crimes. Other studies have shown prosecutors are more likely to seek the death penalty for African-Americans than for whites convicted of the same charges. So you might think that computerized models fed by data would contribute to more even-handed treatment. And increasingly the criminal justice system has turned to “risk assessment algorithms” to do just that.

Most recidivism algorithms look at a few types of data — including a person’s record of arrests and convictions and their responses to a questionnaire — then they generate a score. But the questions, about things like whether one grew up in a high-crime neighborhood or have a family member in prison, are in many cases “basically proxies for race and class,” explains O’Neil. The score generated by the algorithm is used by judges when making decisions about the defendant. People with higher scores will often face higher bail, longer sentences, and lower chances of parole. Instead, O’Neil believes these results could be used to select people for rehabilitation programs or to better understand society’s structural inequalities.

Well-designed algorithms can result in positive reforms within the criminal justice system. For example, the state of New Jersey recently did away with their cash bail system, which disadvantaged low-income defendants. The state now relies on predictive algorithms instead — ones carefully designed to try and eliminate racial bias. Data shows the state’s pre-trial county jail populations are down by about 20 percent.

But still, algorithms like that one remain unaudited and unregulated, and it’s a problem when algorithms are basically black boxes. In many cases, they’re designed by private companies who sell them to other companies. The exact details of how they work are kept secret.

O’Neil also sees a more fundamental issue at work: people tend to trust results that look scientific, like algorithmic risk scores. “I call that the weaponization of an algorithm … an abuse of mathematics,” she says, “and it makes it almost impossible to appeal these systems.” And this, in turn, provides a convenient way for people to avoid difficult decision-making, deferring to “mathematical” results.

In her book, for instance, O’Neil cites the example of a man named Kyle Behm who took some time off from college for mental health treatment. After getting treatment, he applied for a part-time job at a large supermarket chain. In the process, he took a personality test, which is not uncommon for applicants to large companies. Behm did not receive an interview.

In most similar cases, the applicant wouldn’t know why they were rejected, but Behm happened to have a friend who worked at the supermarket who told him the test results were a deciding factor. Behm told his father, a lawyer familiar with the Americans with Disabilities Act, who ended up filing a class action lawsuit against the company.

The type of test Behm took was a lot like a common one used in mental health testing. It generates something called an OCEAN score, an acronym referring to five personality traits: Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism.

Again, it becomes a question of how these scores are used. For certain jobs, some businesses can petition regulators for exceptions that will allow them to legally use such scores. “And then the regulatory body can decide whether it’s a valid reason,” explains O’Neil. Often, though, “companies just sell the same personality test to all the businesses that will buy them” and those businesses don’t bother to determine whether their usage is legal, fair, or even useful.

So how should we go about addressing the problem of poorly-designed algorithms? O’Neil says the solution is transparency and measurement. She says researchers must examine cases where algorithms fail, paying special attention to the people they fail and what demographics are most negatively affected by them.

Credits

Music

Original music by Sean Real

  1. Armin

    Great show as always! I had to chuckle, when the ad for ziprecruiter came up though: “Their powerful technology efficiently matches the right person to your job better than anyone else”. Maybe you could interview them on how their algorithm works or even how much is automated and how much is still human interaction.

    1. eminka

      hahah yup! i wanted to make the same comment. extremely ironic.

    2. Heather

      I had the same response!!!!! That why I am commenting. Yea I wonder what algorithms they use!!! But agreed… really amazing episode.

    3. Marvin

      I came here looking for this comment. I was not disappointed.
      The irony is gold.

  2. Jordan

    So the podcast about the harms of inhuman algorithms was followed by the add for “zip recruiter”, a program/tool to help employers weed out unsuitable job applicants…

    Nice…

    1. Tim

      My thoughts exactly, but I love irony so I’m giving them a pass.

  3. Brian b

    Is it just me or is the audio not playing? Tried on Chrome and on IE and it’s not working…

  4. John

    Ironic that you talk about hiring algorithms being biased then advertise for ZipRecruiter.

  5. Fedinand

    So are we going to ignore the irony (and perhaps hypocrisy) of an episode about algorithms, including a piece about how they unfairly and possibly illegally discriminate against entire groups of people for employment, that is sponsored by Zip Recruiter which undoubtedly uses them to do exactly what is criticized in the podcast?

    1. Marvin

      Had you preferred that they decided not to air this episode in order to avoid the hypocrisy?
      At least the existence of this episode lets you know that advertisers are not overly-influencing what they put out there as content.

  6. DHW

    What if (say) people who live in a particular neighborhood really are more likely to commit crimes? That is, the algorithm is telling the truth, even if Dr. O’Neil doesn’t like that truth?

    1. ABC

      So if it applies to one neighborhood, then we should apply it universally, right? What if it doesn’t apply to the next neighborhood, or the next one, or the next one?

      Get it? Pharming data like that can end up violateing two of her principles — widespread and socially destructive.

  7. Geoffrey

    Wow. Irony abounds. Two things jumped out at me. First, that the two social media fake news stories were pro-Trump and anti-Hillary. It is important to be self-aware enough to notice your own biases. I would love an app that showed me news contrary to what I assume, or from different regional points of view.

    Secondly, one of your own advertisers talked about using an algorithm to provide the best candidates. After what I heard in the story, that made me laugh.

    I love the show. It does open me up to see things I may not otherwise notice.

  8. Tyrone Malik

    Had the exact same reaction as lots in this feed…sponsored by ziprecruiter, whose advanced technology (algorithm) sorts out the “best” candidates for a job. Image not a whole lot of Tyrones and Maliks get hired on ziprecruiter.

  9. Linka

    I just finished listening to this episode via NPR One and came to this page specifically to comment on the irony of the ZipRecruiter ad in the midst of this episode. Looks like I wasn’t the only one ;-). Love your podcast otherwise but you can do better than this.

  10. Steve

    Loved the podcast. Very eye opening.
    Cathy is amazing. Her speech is mathematical in sharpness of point, efficiency and clarity of words. I need help with that.
    Like the others, the irony of the sponsor made me laugh out loud. It speaks to the pervasiveness of the algorithm.

  11. Julie

    Surely the problem is not the “maths” but the people who design the algorithms in the first place? It’s not like there are a whole bunch of robots making up algorithms as they go. These things are discussed, workshopped, designed, discussed, documented and continuously refined by *people*. Blaming the algorithm is absolving the people behind the algorithm of any responsibility.

  12. Evan bontrager

    While I still enjoy design stories of physical locations or objects, this show was excellent. Very relevant with tangible real outcomes. As always the story telling was captivating. And the tie with the United story was an incredible reminder of how we are all targets and users of these tools

    Another example of how amazing 99PI is.

  13. Jamar Berry

    I’m much more interested the song that played just before the ZipRecruiter ad than I am in the irony of the ad. The piece was so moving, as many of the musical selections of this podcast are. Just wish I had a way to own it. Hint. Hint.

    Love the show

  14. Joe Smetter

    Clinical psychology PhD student here…I just wanted to comment that I think your assessment of Kyle’s employer’s use of the Five Factor Inventory for hiring purposes wasn’t quite accurate. The Five Factor Inventory is a personality measure, and it is not typically used in mental health evaluations. The Big Five personality factors measured by the inventory are broad dimensions along which all individuals differ in their personalities. While it is correct that one of those factors, neuroticism, is characterized by a tendency to be emotionally reactive and is correlated with certain types of psychopathology (e.g. anxiety disorders), it is not accurate to say that a person has a mental health problem if they are high on neuroticism. Therefore, the fact that Kyle has a mental disorder, or a specific diagnosis of bipolar disorder is not disclosed to his employer by the results of his scores on the personality test. A seasoned clinician would not be able to make that conclusion based on the results, due to the way that “normal” individuals can score on the measure. It is for this reason that I doubt he will win his lawsuit. Personality tests like the Big Five are quite common in personnel selection, which is not my field of study. I don’t disagree with the overall conclusion of your episode, and I especially agree with the notion that analytical tools such as algorithms are only as good as the people who are using them. I just wanted to point out that in Kyle’s case, the target of his frustration is more closely related to personnel selection (industrial/organizational psychology), than to discrimination based on mental health status (clinical psychology).

    1. Kira M.

      I’m a personality psychologist, and I came by to make the same point made by Joe. I don’t use it in applied, work settings, but I do use the Big Five frequently in academic work. The Big Five or Five-Factor model, is a personality theory that people have 5 broad personality traits. The tests based on the Big five measure people’s personality on those traits in general. It is a personality test. It is NOT a mental health screening. People with no mental health issues could be high or low on any of the traits. There are some relationships with high neuroticism and mental health issues, but you cannot infer anything on the individual when you are looking at averages.

      Please be careful when you discuss psychological topics. Not everything in psychology is clinical. How these tests are applied are open to interpretation, but please be more thorough in your research on this topic in the future.

    2. Brenton Wiernik

      Workplace psychologist here. The section on Kroger’s was extremely inaccurate. The Kronos personality inventory is explicitly NOT a tool for clinical diagnosis. Using medical screening tools is illegal according to the ADA, but normal-range personality instruments are not medical tools. People vary in their personality traits, and some characteristics tend to make people better employees (e.g., being hardworking, reliable, interpersonally skilled). These types of traits are often what employers try to assess through resume screening, interviews, etc. Standardized personality assessments allow employers to assess these traits in a way that is fairer and less prone to human-introduced biases. The personality instruments that are designed for use in employee hiring are carefully tested to be reliable, valid predictors of performance and to ensure that they don’t disadvantage members of gender, racial, or mental health groups. Far from introducing bias, the past fifty years of research has very strongly shown that using a standardized hiring process reduces discrimination and bias in the process. And to reiterate, normal-range personality assessments are NOT medical diagnostic tools. They may look similar, because mental disorders in many cases represent maladaptive extremes of normal personality traits. But a normal-range personality assessment can’t provide any information about a mental disorder diagnosis. As a analogy, compare a screening tool to diagnose dyslexia with a high school English exam. They both involve reading and may have items that on the surface look similar. But you obviously can’t substitute one for the other.

  15. Michael H Light

    Hey, do you think Zip Recruiter uses an algorithm? I wonder how many people aren’t getting jobs because of their filtering? Just a thought while I was listening to your podcast.

  16. cakeslip

    I noticed the biases the other listeners did as well, but one thing that I found fascinating is the capitalist realism of the presumption that we could never ask Facebook to do something counter to their bottom line to which I say, “Bullshit.” Other industries are regulated for the good of the populace to the detriment of their profit, and Facebook could be too. The widgets it produces are memes, and I don’t mean the shallow new fangled denotation of “meme” as a pithy captioned photo.

    This seems especially astonishing as we witness the death throes of capitalism in the U.S. So many of the problematic algorithms have as their aim the maximization of capital; it seems only natural to question that goal. Surely this is the primary hard conversation society is avoiding by building a wall of math.

    We have a continual conservative outcry that social programs should be run like businesses, and the trappings of its metrics often satisfy such advocates. But the metrics have not come alone. As the guest pointed out, they have come fraught with the same capital maximization values that fathered them in the business sector.

    Look at Trump’s ridiculous one-in-two-out executive order concerning administrative regulations. The repeal of the two must fiscally cancel out the cost of the one. This leads to rank absurdities like placing monetary values on intangibles like the enjoyment we get from the beauty of a stream, or the public sense of security that our food will not poison us. I pity the agency that has to fabricate these algorithms just so we can jam these values into the coffin to rot with capitalism’s stinking corpse.

  17. Vic

    I personally think that companies should be a lot more critical of the technology they purchase and its usages. Lots of devices come with hidden risks and problems – flaws in the intelligence, glitches, secret collections and storage of private data, etc. And I believe that we, collectively as a society, need to better ourselves with the way we use tech AND with what kinds of purchases we make.

  18. br

    irony aside, has anyone applied the test to ziprecruiter’s algorithm? i have posted my resume to ziprecruiter and i am not sure if that was the site that got my resume to the person that placed me at the subsequent job, because i was posting on lots of different job posting sites. and things like this really get my ire up.

  19. Daybreaq

    Here’s another mention of where an algorithm has been problematic:

    “The agency uses a machine called a millimeter wave scanner at nearly every airport in the U.S. The machines, manufactured by L3Harris Technologies, rely on an algorithm to analyze images of a passenger’s body and identify any threats concealed by the person’s clothes.”

    Read more here: https://www.miamiherald.com/news/local/community/gay-south-florida/article234220347.html?#storylink=cpy

    The TSA agents have only two choices to classify a person going through the scanner: male or female. If the person’s body does not conform to the machine’s algorithm of what the chosen classification should look like, an alarm goes off.

Leave a Reply

Your email address will not be published. Required fields are marked *

All Categories

Minimize Maximize

Playlist