Settings

Theme

Biases in AI Systems

queue.acm.org

40 points by sightmost 5 years ago · 26 comments

Reader

kingsuper20 5 years ago

The oft-suppressed elephant in the room is 'what if the bias is correct?'. It's an uninteresting bug in the system up until that time.

  • jasonhong 5 years ago

    This is a really good question, one that we've discussed amongst at our university (a lot of researchers at Carnegie Mellon).

    To be useful, ML systems have to have some kind of bias. However, the distinction here is that some of these biases are harmful biases. Kate Crawford talks about allocative harms (how resources are allocated) and representation harms (e.g. stereotypes).

    Some of these harmful biases are really blatant. For example, labeling Blacks as "Gorillas" is offensive for many reasons.

    Some of these biases "correct" but that's due to biases in the data set or society. The ProPublica investigation of recidivism prediction is a good example, where it was more likely to say that Blacks should not be released. However, police are also more likely to arrest Blacks, which naturally leads to this bias. Other examples here include Amazon's resume system that was biased against women (since they used Amazon's hiring practices as ground truth), and image search for "professional hairstyles" that showed White women but "unprofessional hairstyles" that showed Black women.

    Other biases are also "correct" but greatly miss the underlying context. For example, a naive AI system might tell you don't go to a certain medical doctor that is a professor, since they have a higher rate of deaths. However, this doctor might also be a doctor of last resort, hence the high mortality rate.

    What I'm trying to get to is that even the term "correct" has a lot of subtleties to it. In many cases, figuring out what is "correct" (or ground truth in ML terms) can be a clash of values and world view, and might have different results based on differences in race, gender, age, culture, context, and power.

    • kingsuper20 5 years ago

      "Other examples here include Amazon's resume system that was biased against women (since they used Amazon's hiring practices as ground truth), and image search for "professional hairstyles" that showed White women but "unprofessional hairstyles" that showed Black women."

      Given enough data, and no doubt Amazon surveils their people more than most, they could determine the 'truth' along a more straightforward line.

      "Does this hair style make more money for the company"

      As hair can be a strong form of expression, there's probably a measurable delta here.

      Going forward, smart companies will obfuscate the determination. I suppose that training an AI is not a bad way to pull this off.

    • meowface 5 years ago

      These are all very good and true points, but I think one of the questions the parent poster was trying to ask was what happens if an observed bias evident in a model's output simply proportionally lines up with reality, even when controlling for confounders.

      (This is all further demonstration of just how complex a term like "correct" can be in this case, as you point out, but I think it's worth considering the whole spectrum and perhaps the "Devil's advocate" instances of potential correctness.)

      This is one example recently given by the founder of a controversial AI-based insurance company (https://www.lemonade.com/blog/ai-can-vanquish-bias/), where he claims sufficiently fine-grained AI classification actually reduces group bias even if in some cases the output may, in aggregate, be statistically biased towards particular groups:

      >Let’s say I am Jewish (I am), and that part of my tradition involves lighting a bunch of candles throughout the year (it does). In our home we light candles every Friday night, every holiday eve, and we’ll burn through about two hundred candles over the 8 nights of Hanukkah. It would not be surprising if I, and others like me, represented a higher risk of fire than the national average. So, if the AI charges Jews, on average, more than non-Jews for fire insurance, is that unfairly discriminatory?

      >It depends.

      >It would definitely be a problem if being Jewish, per se, resulted in higher premiums whether or not you’re the candle-lighting kind of Jew. Not all Jews are avid candle lighters, and an algorithm that treats all Jews like the ‘average Jew,’ would be despicable. That, though, is a Phase 2 problem.

      >A Phase 3 algorithm that identifies people’s proclivity for candle lighting, and charges them more for the risk that this penchant actually represents, is entirely fair. The fact that such a fondness for candles is unevenly distributed in the population, and more highly concentrated among Jews, means that, on average, Jews will pay more. It does not mean that people are charged more for being Jewish.

      >It’s hard to overstate the importance of this distinction. All cows have four legs, but not all things with four legs are cows.

      >The upshot is that the mere fact that an algorithm charges Jews – or women, or black people – more on average does not render it unfairly discriminatory. Phase 3 doesn’t do averages. In common with Dr. Martin Luther King, we dream of living in a world where we are judged by the content of our character. We want to be assessed as individuals, not by reference to our racial, gender, or religious markers. If the AI is treating us all this way, as humans, then it is being fair. If I’m charged more for my candle-lighting habit, that’s as it should be, even if the behavior I’m being charged for is disproportionately common among Jews. The AI is responding to my fondness for candles (which is a real risk factor), not to my tribal affiliation (which is not).

      One thing his post doesn't discuss is what might cause such a group correlation and how much agency is involved. In the case of candle-lighting, it's presumed that people (Jewish or otherwise) are doing it purely out of their own free will, or at least due to a belief/practice they largely have choice over.

      If instead the root cause is hypothetically partly or wholly extrinsic (e.g. police being disproportionately more likely to arrest people among certain groups, with it remaining disproportionate after accounting for the true crime frequency/severity base rate for individuals in that group), then I think an analogue of the above example wouldn't hold up, because, as you say, the inputs would be inherently unjust, even if they're in some sense statistically predictive. So it'd be unfair to use such data.

      Then there's the grayer area. What if a group is hypothetically disproportionately represented among a certain data set or proxy but the representation is commensurate with the true base rate among individuals of that group?

      In some sense, it's not unfair, because you're getting actual data based on what people are actually choosing to do or not do.

      But it opens the door into larger arguments of culpability, free will, being dealt a bad hand, etc. It's inherently unfair to be born into a very poor family or a crime-ridden area or a house with lead paint or as the ancestor of generations of people who were oppressed, abused, shut out of society, and otherwise treated very unfairly, let alone potentially abducted, enslaved, and/or subject to genocide. Even if the true rate hypothetically lines up with the proxy, there still might linger impactful and lasting trickle-down effects from generations of very unfair and incorrect proxies. So it could potentially be correct inputs, correct outputs, but still unfair in a deep sense. However, is it unfair to the point of it violating discrimination laws? I don't actually know. And I could see many different arguments about the ethics of such outputs.

      And then there are of course the epistemological problems / meta-problems, here, which might be the trickiest of all: how do or can you know the data is accurate, how do or can you know the true base rate, etc. So it's very difficult to tell in practice how fair any particular metric is.

      Bias is clearly a major issue for AI, but I think it's a pretty nuanced subject. It's easy (but of course deeply necessary) to list all the actual and theoretical failure modes, but it's hard to always truly determine how fair something is and exactly what ethical and philosophical principles to use when judging fairness.

      I know that's basically just a reiteration of your point, but I always see this framed from the perspective of how easy it is to get things wrong, without examples of cases where one could potentially "steelman" the wrongness; or earnestly steelman it yet still ultimately conclude it doesn't conform to a particular society's values, even if it might conform to laws. (Or at least a subset of a society's values - given some of the seeming fundamental value divides in the US. Two people could agree about most of the above but come to very different conclusions if one of them is socially left-leaning and the other is socially right-leaning.)

    • LarryEt 5 years ago

      Hey yinz . My second home is PGH also although I am not there now.

      Such a thought provoking post. Thank you. So much to learn from this. I would expect no less from CMU.

  • ianhorn 5 years ago

    Rather than being a suppressed topic, in my experience, this is a case of people talking past each other. It's like correlation versus causation (versus plain old connected definitions). It can be true that A and B are correlated, while A doesn't cause B (or neither causes the other), and while their definitions have nothing to do with each other. Like nurse and gender. They're correlated in the US, but making someone a nurse doesn't change their gender, and the definitions have nothing to do with each other. Maybe in some countries the correlation is even flipped!

    Recall all the times in stats where an estimator can be an unbiased estimator of a correlation while being a biased estimator of a causal effect.

    So you get some people saying it (the correlation) is correct and other people saying it (the causal effect) is incorrect. Both are right! To stop talking past each other, they need to talk about bias with respect to the correlation or bias with respect to the causal effect in this particular direction.

    But what frustrates me is when the correlation side uses the (true) correlation to argue against a system being biased with regards to something else (w.r.t. a definition or w.r.t. a causal effect or w.r.t. a literal translation or w.r.t. some more complicated aspect of the system), and that harms are okay because the bias is a correct bias.

    We need to work on our terminology so that we can stop talking past each other. It doesn't help that our models have weird biases in absurdly complex function spaces, but we have to progress beyond a first-stats-course one-size-fits-all definition of bias.

  • YeGoblynQueenne 5 years ago

    The phrase "the bias is correct" sounds like an oxymoron. Could you explain what you mean by it?

    Also, who is (oft-)suppressing the "elephant in the room"?

    • bananabiscuit 5 years ago

      I have a feeling you might already have a good hunch about the answers to your questions, but I’ll bite:

      If you see in a data set that Danes are tall, and that Kenyans are fast, and that Ashkenazi’s are smart, then it is a valid hypothesis that should not be thrown out outright, that the reason that’s the case is due to actual differences inherent to the population groups and not any other confounding factors.

      As for your second question: mostly progressives, leftists and liberals.

      • YeGoblynQueenne 5 years ago

        Please don't bite unless you're a fish. I'm certainly not trying to reel you in. Rather the reason I'm asking is that the expression you used is vague and imprecise and I don't know what exactly you mean by it.

        As you suspect, I can guess what you might mean, but if we start double-guessing each other, we're just injecting noise in the conversation and a few comments from now we end up completely confused about what each other is trying to say. So much better to establish some common language before we waste time talking past each other. Doesn't that make sense?

        Keeping that in mind, I am still not happy I understand what you mean with the following:

        >> If you see in a data set that Danes are tall, and that Kenyans are fast, and that Ashkenazi’s are smart, then it is a valid hypothesis that should not be thrown out outright, that the reason that’s the case is due to actual differences inherent to the population groups and not any other confounding factors.

        The reason for my continued uncertainty is that you do not say, in the above example, what is the "bias" and how it is "correct". You've identified a hypothesis that you can make about the data (rather than a hypothesis derived from the data, i.e. some kind of model that explains the data). But a hypothesis is not bias. A hypothesis can be correct or incorrect, and bias may play a part in that, or not. But a hypothesis is a hypothesis and bias is bias. So what is "bias", the way you mean it?

        >> As for your second question: mostly progressives, leftists and liberals.

        Ugh. I shouldn't have asked. I'm going to take a wild guess that you're from the USA and that you have some kind of stake at the culture war you folks got brewing over there. I'm not from over there and I want no part in that. So forget I asked. And good luck getting all that sorted out between you.

  • IfOnlyYouKnew 5 years ago

    Even if a bias is “correct” at the population level, using it to make decisions regarding individuals is unjust and prone to be wrong.

    Example: men are known to commit the vast majority of violent crimes. But using that statistic to convict someone, deny them a job etc. would be inappropriate.

  • alok-g 5 years ago

    See my comment here on definition of 'bias'.

    https://news.ycombinator.com/item?id=27631529

  • tr352 5 years ago

    Can you given an example of a bias that is correct?

    • hpoe 5 years ago

      Well I feel that is a really really broad term to just ask for bias without really defining it but a couple off the top of my head are.

      1. Someone from Utah is more likely to be a member of the Church of Jesus Christ of Latter Day Saints than someone from Pennsylvania.

      2. Someone from an Arab speaking country is more likely to be Muslim than someone from a non Arab speaking country.

      3. Someone who says "eh" at the end of every sentence is more likely to be Canadian.

      4. Someone who says y'all is more likely to be from the south.

      5. If someone asks me to "Please do the needful" they are likely from India.

      I've purposely chosen non extreme examples because there are many basis all over the place. Bais ≠ prejudice.

      Ultimately if we artificially restrain AI from being "baised" in any form we are really shooting ourselves and those most disadvantaged in the foot because instead of being able to use AI to discover the basis and then work on fixing it we instead just to pretend it doesn't exist.

      Finally a more provocative example. People who get pay day loans are less likely to pay back loans, black people are more likely to use pay day loans, ergo black people are more likely to default on loans. If we try and just force an AI to ignore this then we paper over the problem. If instead we start to examine causality we can start to figure out the root of the issue and how to address.

      • alok-g 5 years ago

        Note: I follow a more standard definition of bias [https://en.wikipedia.org/wiki/Bias] which makes it errorneous nearly by definition.

        Indeed. Use of priors do not intrinsically make the system biased. It's a bias only if those priors are incorrect for whatever reasons, or if the facts specifically about the sample under consideration are not able to override the population priors.

      • samkater 5 years ago

        I’m not outright disagreeing, but it seems your last statement contradicts the rest of the payday example. “If instead we start to examine causality we can start to figure out the root of the issue and how to address.”

        The causality piece is exactly the issue, right? People who use payday loans have less savings, more likely to work in jobs where their hours are unstable, have other poor financial indicators (past use of a payday loan, for example). Black people may disproportionately fall into this category, but I would argue it is wrong to effectively punish all black people (or conversely give other ethnicities an easier time) simply because of their race.

        Biases exist, no argument there. The dilemma is what we do with them.

      • username90 5 years ago

        Another way to word it is that an unbiased AI will never be able to perform better than humans at many tasks. Statistically accurate bias isn't a bug, it is a feature. Sometimes you want to avoid it for other reasons, like it feels wrong to assume traits are correlated with race etc, but by default the AI Should always be biased except for a few special cases.

      • commandlinefan 5 years ago

        Even something as mundane as identifying a face as being "male" or "female" is fraught with controversy.

      • avs733 5 years ago

        >Well I feel that is a really really broad term to just ask for bias without really defining it but a couple off the top of my head are.

        I hope this doesn't come off as overtly pedantic, but of course it depends on how you define it. However, even when we DON'T define it, the examples that are used reflect a definition. Fundamentally, the problem is in the continued assumption that data has inherent meaning, rather than being interpreted by human beings in context.[0]

        There are two conflicting definitions of bias at play here...

        Definition 1 (I believe this is yours): Bias = a purely statistical phenomenon, a situation where one variable is meaningfully predicted by another one. Synonym for correlation.

        Definition 2 (I would argue this is mine, the authors, and the colloquial): Biases refer to specific prejudices that are typically unfair. Note...in this definition bias is a synonym for prejudice. They are not making a data argument for that, they are using the term bias to reflect a prejudiced pattern because they understand AI cannot have a 'prejudice' but that societal prejudice can induce biases in Data.

        When people use bias to mean definition 2, they are not inherently saying all definition 1 biases are bad. Your examples, under their definition are not biases, they are correlations. The fact that some people use definition 1 does not mean definition 2 is invalid. The authors definition of Bias starts from the idea that a bias is a prejudice, that there are other terms besides bias to describe your examples. Arguing over the definition is distracting you from the point the authors are trying to make...it doesn't actually serve you understanding the article. I say it that way to separate it from anything about this conversation, because I think you are coming at this in good faith and with a sincere point.

        No one is forcing AI to ignore that...instead they are saying AI should be able to put that in context of greater societal patterns to be valid. All math, all innovation, all idea exist within a society when they are implemented. It is how they are used that matters. As a thought experiment to your provocative example...ALL pay day loans have a higher default rate...if all mainstream banks .

        To give a different example, lets look at the field of educational test and measurement (my current living). I design a test of math ability. I contextualize each question within a game of Cricket. Who do you expect to do well because they have more experience with Cricket? who do you expect to underperform due to the context of the question? ...What am I actually measuring (hint: not just math)? If the data from such tests were inherently 'free from bias' than it wouldn't matter if I asked demographics questions first or last, when in reality - asking demographics questions before a math test lowers Women's scores. Educational test folks constantly ask: what is being measured. They follow it up with a second important question: How is this test, data, and resulting scores going to be used. What does it mean for a test to be fair?[1] When am I trying to measure math skill and I end up measuring gender and poverty instead? If that happens, what can I do with the data? What does the data really mean?

        [0] as a fun footnote...this assumption is actually embedded in language used in research, it is why 'Science' fields have tended to stick to third person language while 'social science' and other related fields have largely flipped to first person language.

        [1] https://www.ets.org/about/fairness

    • nmca 5 years ago

      Lee Jussim has written some well-cited books on this topic, and makes some effort to handle it with care:

      https://en.m.wikipedia.org/wiki/Lee_Jussim

  • LarryEt 5 years ago

    I just started on Kahneman's Noise: A Flaw in Human Judgment 2 days ago.

    I mean it took so long for Kahneman and Tversky ideas on bias to disperse that we can even be talking about bias in this context.

    Bias isn't even the real problem with ML, noise is obviously.

visarga 5 years ago

A very good paper.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection