Facebook apology as AI labels black men 'primates'
bbc.comPrevious discussion from a couple days ago.
my mistake, missed that.
Eh, no harm, 200+ comments and no one complaining.
People obviously still see value in discussing it
This happened to both Google Photos and Flickr too. Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?
Google Photos in 2015: https://www.wired.com/story/when-it-comes-to-gorillas-google...
Flickr in 2015: https://www.independent.co.uk/life-style/gadgets-and-tech/ne...
The reason these companies don't fix these systems is because they don't know how. It is easier to remove certain outputs or retire the whole system. There is no line of code they can tweak.
Richen the dataset it’s trained on enough so that the model is correct before you release it to prod.
That's sort of obvious. How do you know that wasn't attempted?
Even if it was fixed, in a probabilistic system like this, isn't it basically guaranteed to happen with some inputs?
Is that a real question? Of course it will happen. In this particular case there was a single misclassified video reported in the article.
Yes it's a real question, since there's nothing that says that a particular misclassification must happen. Watching cars go by on the road, one might suspect that at least one is driven by an alligator, but nothing says that it must be, per se, even the law of large numbers.
Nobody said this particular misclassification must occur. But there will be misclassifications, which is what your original question asked. Since you know the answer, why ask the question? That's why I asked you if what you asked was a real question.
Yes they did, I said that. But it was a claim made as a question, because I didn't know whether it was actually true. I still can't demonstrate formally why this would be so, because again, the reasoning and even veracity of the claim is still in question due to lack of anything but a hand-waved answer.
There is no need to formally demonstrate. The veracity is clearly not in question. It must be true, due to the existence of the article we are commenting on now.
If you want to argue for the general case, you can simply prove the negation is false. Since it is incorrect to say that a network trained with a tiny percentage of possible inputs will never misclassify, it is true that a network trained in such a way will eventually misclassify. This is bolstered by training any network and seeing they always will misclassify something.
> Yes they did, I said that.
You didn't say that. You said a misclassification would happen on some inputs. That is different from saying on these specific inputs.
If it was we wouldn’t expect this problem to occur, correct?
We don't have enough information to root cause the problem
No, that's not correct.
That makes it sound even worse that they knowingly released it without fixing it.
Are you saying an ML system should never be released if it doesn’t have perfect accuracy?
Tagging black people as monkeys is not a showstopper bug? If so it makes them look even worse than if it was an overlooked bug.
Quite honestly, in a team that's stressed and run down to the line, checking for that particular classification in a model that has hundreds of classification targets can be really tricky.
Say you have 1000 classification targets. You have to produce a model that checks, for each target, the odds of it being classified as one of other 999.
You have to check, specifically, for "adult male as primate" out of a million potential combinations. And apply secondary business rules or optimizations to prevent that classifications.
So yes it's possible, but it's not cheap, simple or easy.
Facebook just decided to shove the model out the door and not worry about the consequences.
Quality engineering work, costs money and time. Facebook didn't spend it.
I agree.
It does seem worse that way.
We don't actually know how to do that, or how rich is "rich enough." It's an open avenue of research to be able to extrapolate how well-tuned a neutral net is on data not in its training set.
Not to imply the problem is unsolvable, just that if an institution has zero tolerance for this mistake, the fix your describing is no guarantee it won't occur.
That’s not quite complete, right? It’s that we don’t know how to do that without sacrificing other things.
This reminds me of a favorite tweet from 2013: "Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there" -- https://twitter.com/alliebland/status/402990270402543616
Facebook, like a lot of tech companies, has long had problems with diversity in engineering. Here's an article from April that discusses specific incidents and the broader background: https://www.washingtonpost.com/technology/2021/04/06/faceboo...
This isn't a problem with diversity. Everybody knows how to pronounce Malcom X. And it's not like just because a google engineer was black that he was like "oh, let's try and see if Malcom X is pronounced correctly because he's black and I'm black too". This only happens in white people's brain.
I don't know if I 100% align up with how you stated it, but yea, its a matter of training data set. I don't think these companies have published their training data set. But thinking back on the issue with asians and facial recognition on Apple's face ID. If they just choose 100 people at random, based off US statistics, 5-6 of those 100 people would have been Asian. And that reflects the 5.7 percent of the population is Asian. And we probably all agree 5-6 people is not a sufficient data set, but picking 100 people at random would be a pretty easy assumption to make for making a data set.
So yea, I think it is an issue with generating a data set and not hitting a sufficient amount of test cases. Because in this instance, asians would be an edge case where creating a small data set to train an algorithm on with a group with a lower representation in the population.
I wonder what the datasets of companies like Xiaomi look like. FaceID always worked for me, so it seems like it works for non-asian faces.
Maybe they took more caution to their data set. I think the only way we would know is if they publish their sets or how they built them. But I was just highlighting maybe one possible case that Apple could have generated their training set, just grab 100 people in America at random.
You realize that the person I'm quoting isn't white, right?
Let's separate the general case from the specific. Generally, we know that representation in the people who make things changes what they make. This is obvious and undeniable. For example, look at ASCII vs Unicode. The Chinese invented movable type 500 years before Gutenberg, so it's not like the idea of printing non-roman characters was novel. In the age of telegraphy, Europeans developed encodings that included umlauts and accents; by 1851 they were merged into International Morse Code.
So why in 1963 was ASCII codified without any of that? And why did that become the dominant standard for an extended period? Because it was mainly Americans in the rooms where the technology was being created.
Similarly, we know that standard color films were developed by white people to represent white people well: https://www.vox.com/2015/9/18/9348821/photography-race-bias
And we all know how this happens. It's the same reason a lot of open-source software is good for a developer audience, not an end-user one: making things means iterating on them until they're good enough for the people involved.
That's the general case, so let's return to the specific case. If you want to prove that ML systems doing racist stuff has nothing to do with who made it, then you can't just handwave it away. You have to show why that specific project was set up so carefully and so well that it would avoid the natural pitfalls of any technology project. And then despite that it went on to do racist stuff. For reasons that you'd then have to explain.
Considering the adversarial attacks that image recognition systems are vulnerable to, perhaps even a well trained system could be induced to produce inappropriate results of one sort or another. Perhaps the training set and algorithm for the model should just be publicly available so that people can scrutinize the data and figure out incrementally how to avoid most biases or guffaws to a generally accepted level.
> This only happens in white people's brain.
'Eleven Jinping': Indian TV fires anchor over blooper.[1]
To play devil advocate, maybe the station fired her for ignorance of current events.
> maybe the station fired her for ignorance of current events.
That would be a valid reason, but I suspect a more culturally appropriate one: loss of reputation. We are sensitive to that.
My point was this isn't something that only goes on in 'white' brains but more of a cultural issue. Most people in the West are incapable of pronouncing Asian names. I don't see people making a big issue out of it.
In what universe is "Ten" a more common pronunciation of "X" than "X"? You might have an argument for "II" or "III", but I'll be shocked if any street in USA is named after the tenth generation of really unimaginative namers.
Do you think Google is having someone go through the tens of thousands of street names?
Or do you think they had a team (on a completely different project or perhaps company) write a text to speech function that wasn't well suited for directions.
Streets have lots of numbers after all. People frequently have numbers in their name.
I could see that for Google Maps v1.0. I think we're past that point now. There's no reason they should still be using libraries suited to parsing the names of forgotten European monarchs.
They’re neither forgotten, unused, nor is it a nomenclature used exclusively by Royals; nor are all the Royals that use this fashion dead or out of power.
Oh for Pete's sake, absolute bloody conspiracy level nonsense, NOBODY sat there twirling their villainous mustache and programmed an exception to hardcode pronouncing X as 10, it's simply a matter of the training and sample data having access to some type of corpus that contained a great deal of Roman numerals.
(Leave the software engineering to the software engineers)
>> to some type of corpus that contained a great deal of Roman numerals.
I wager that there is more text online about Louis XIV than of Malcom X. Certainly there are many more books on that epic corner of French history than one modern US leader. Then there are all the British kings. Point an AI at the internet and it likely would decide that roman numerals are most often pronounced as number than letters. Malcom X would be rare an exception that might need to be hard coded.
For sure. If we're going with the common pronunciation of Roman numerals in English names, it's "Tenth". E.g., We don't say "Henry Ford III" as "Henry Ford Three" but "Henry Ford the Third".
There’s a Louis XIV Street in New Orleans (and I imagine elsewhere).
You mean Louis 'Ziv', according to Google
Putting Louis XIV in Google translate, I get the correct "Louis the Fourteenth" and "Louis Quatorze" pronounciations in English and French, respectively. However, it has to be uppercased, otherwise it spells the letters.
The implication is that a black person would be more likely to recognize the inherent flaw in automatically interpreting "X" as "10", and in all honestly that's probably true. It isn't a matter of testing, it's a matter of having people with a diverse set of cultural perspectives in the room when decisions like that are made to begin with.
Diversity doesnt guarantee you automatically catch or account for edge cases. As a minority I am disturbed by some of the odd takes people have about diversity. Theres thousands upon thousands of roads. Unless you have a QA team test directions to every road in the country you wont ever catch the issue with a road named Malcom X. You don’t even have to be ‘diverse’ to know who that is.
It doesn't guarantee it, but it helps.
I personally have gotten bugs fixed at Google. How? Because I, a white man, spotted a bug, cared about it, and talked to white men of my acquaintance at Google who had enough power to get things done. How did I know them? From other tech companies created, run, and majority staffed by other white men.
Why am I in these networks at all? Well, my dad was a software developer and he introduced me early on. How did he get his start? His dad, an insurance company exec, brought him in to deal with this newfangled computer thing they had just gotten. That was in Milwaukee in the mid-1960s. I promise you that although Milwaukee had a significant black population, exactly zero of them were insurance company executives in the mid-1960s.
So what Allie Bland knew when she wrote her tweet was that she did not have any connection to Google where she might be able to get a to-her glaringly obvious pronunciation issue fixed. That in her estimation no black person did. And I see no reason to think she was wrong.
This is a contrarian take that may get me downvoted and unfairly labeled, but I encourage critical thinking instead:
I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.
Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?
There is literally no right answer, the very nature of modern diversity is that it will always be a moving target. That is until we get over the entire concept of diversity which is racist / discriminatory at it's core.
That's incorrect. The main use of diversity is in an antiracist fashion. I'd suggest you read one of Kendi's books. Stamped from the Beginning has clear and readable descriptions of the difference, but it's a relatively long work, so you might start with one of his shorter books.
Instead of dismissing the argument with a tawdry negated statement and a book suggestion, do you have some thoughts of your own with this matter, or at least some kind of summary?
No.
Long ago I learned that it was rarely worth my time to try to argue online people out of their ignorance. A rando with a throwaway account, a strident tone, and a fair bit of ignorance on the topic is almost a guarantee that that's no point.
If you're interested in knowing something about the topic, you'll do some work. If you aren't, no amount of me spoon-feeding you summaries of serious scholarly works will change that.
If you do end up learning something and have questions, feel free to email me. I'm glad to discuss the topics with people who are serious about it.
What distinction does a throwaway account make on an otherwise-anonymous online forum? No need to take discussion offline. Within the next decade, I am in confident the pendulum will swing the other way, and the people who are able to vocalize their opinions now in public will be the ones needing throwaway accounts.
That's another very ignorant question. You can look up the literature on it. (Heck, you could have just followed a link in my bio to get a start.) But again, I'm not spoon-feeding you.
South / East Asia has more than half the world’s population yet doesn’t count towards diversity.
Why not? My point is that it should! What percentage of the US population is from South / East Asia? How does it compare to the representation of others? If it's similar or less, and it still somehow doesn't "count," then we have a diversity problem.
Nobody is saying they don't count toward diversity. What people are saying is that the conspicuous exclusion of less favored racial groups does not get erased because they have some people from other groups.
Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans. The latter is a problem that we have to solve regardless.
And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...
> Put more frankly, the success of recent immigrants does not erase America's long history of brutality and exploitation toward blacks and Latin Americans.
But given that America was far more brutal and exploitative towards Chinese immigrants than towards Latin Americans, why are Latinos so prioritized by these initiatives to favor certain racial groups?
> And I think it's worth noting that some of the immigrants have brought their own biases with them, such that caste discrimination is now also a problem in Silicon Valley: https://www.washingtonpost.com/technology/2020/10/27/indian-...
Ironic that in a discussion about diversity, you believe in a prejudiced stereotype about a major ethnic group in Silicon Valley. Casteism is pretty much a nonissue in Silicon Valley, if only for the simple reason that most Indian-Americans tend to be ignorant about the castes of most other Indian-Americans.
Sure, and we are discussing the existence of racial discrimination in engineering hiring at top tech companies, not American history or South Asian culture. Asian immigrants on H1-B conducting coding tests as interviewers at FAANG did not involve themselves in the American Jim Crowe south, for example. It's saddening to see America's own past being used to justify discrimination in the present, even to people who aren't originally from the US.
You might not share the beliefs of others that are gainfully rallying behind diversity as a cause to justify penalizing some minority groups for "doing too well" and bolstering others (the literal definition of discrimination), but it IS happening -- and certainly more people than "nobody" are backing it, provoking my original statements. Someone had to put Prop 16 on the ballot, for example (which was thankfully voted against by a large margin of fellow CA Democrats).
The notion that American tech companies are somehow entirely separate from and unrelated to American history is quite a belief to hold. It's not one that stands up to any understanding of the topic, alas. But since that's a hill you've chosen to die on, I'll leave you to it.
The short answer is that tech companies run diversity programs for three reasons: they believe in righting wrongs, they don't want to be sued for biased hiring practices, and they don't want bad PR. All three require under-represented minorities.
Turning it back on you, what should the point of a diversity program be? What's meant to be achieved outside those three goals?
While I certainly understand bad PR (a surprising number of people lack critical thinking skills), what is wrong or biased about hiring for coding positions based on merit-based performance on an objective coding test? Anybody regardless of background or group membership that passes will be hired, meaning it is fair and unbiased, by definition -- that is the diversity program, and if there is some lack of objectivity, that is what needs to be addressed. If that is not the case, then yes, I agree with you, the hiring process would be biased.
You really need to interrogate "merit" and "objective". Nominally objective standards have long been used to advance racial discrimination in the US. For example: https://en.wikipedia.org/wiki/Literacy_test
You should also look up the extensive critiques of meritocracy as a concept. There's a lot of literature there.
Further, I know of no major tech company who uses a nominally "objective coding test" as the only criterion for hiring. And they shouldn't, because being good at taking coding tests is not the job and not what we should be hiring for.
No, coding tests are not the "literacy tests" you have described, and if they were, why would some minorities be performing even better than Caucasians on them?
Coding tests examine the type of work actually required to be done on the job (as coders), and they have been correlated with post-hire performance successfully. Someone who is not familiar with efficient data structures will not write scalable code and will end up creating a burden on their teammates during on-call, for example. Asking someone to solve an engineering problem with a provably correct answer is an objective test for hiring engineers, and I will have a difficult time continuing to engage with anyone who counteracts this basis of reality and truth.
When I was hired there were three coding test rounds and one interpersonal round. You might argue that the latter is where racial discrimination seeps in, as well as the recruiter outreach step itself, but somehow I am optimistic that a bunch of tolerant Californians have moved past applying a Literacy Test here already by hiring a majority immigrant / minority workforce. In my situation, my recruiter was also an Asian-American minority.
I didn't say coding tests were literacy tests. You also seem a lot like somebody who has not hired people, which would explain your poor understanding of how hiring actually works.
Since my comments here don't seem to be making any sense to you, I'm not seeing the point in trying again.
Why are you assuming that I haven't hired anyone before? And any reasonable observer would agree that you are falsely equivocating the literacy tests of yester century with the modern day objective hiring practices of FAANG companies.
Actually listening to minorities instead of summoning some kind of sick quota for different ethnicities. Racists are in stark decline and it didn't even take a diversity program or a change in language rules.
The companies are then righting the wrongs on the shoulders of innocents, that most likely never were racists to begin with. In short, just committing to another mistake.
> These population centers are some of the most diverse regions of the world.
South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.
> South Asia and SE Asia, maybe. But East Asia (NE China, Korea, Japan) has actually one of the most ethnically "pure" populations in the world.
Northeast China–usually defined as the provinces of Liaoning, Jilin, and Heilongjiang–does not belong on your list. According to the 2000 Chinese census, about 10% of the population of Northeast China comes from ethnic minorities – the majority of whom are Manchus, but also including significant numbers of Mongols and Koreans. That is far from being 'one of the most ethnically "pure" populations in the world'-especially when compared to Japan or Korea.
Indeed, even though Northeast China was (in 2000) approximately 90% Han, prior to the 19th century Han were a minority in the region, and Manchus were the numerically (and politically) dominant ethnic group.
According to the 2000 Census, the most ethnically homogenous part of China is not the North or Northeast, but rather Eastern China, which is over 99% Han (and, as well as being over 99% Han overall, 4 of its 7 provinces are over 99% Han too.) By contrast, North China is about 94% Han and Northeast China is only around 90% Han.
(There have been two Chinese censuses since, in 2010 and 2020, but I can't find ethnicity figures for them.)
This is so ridiculously ignorant.
Your comment implies black engineers will check that Malcom X Boulevard is pronounced correctly. That's awfully specious.
Alternatively it just implies white engineers never have their GPS taking them through Harlem.
Yes, all engineers are white </sarcasm>.
Or how about this one: Yes, all black engineers on the maps team live in New York.
Truth is that it is just an example one of the thousands of edge-cases that exist in these types of complex products, and some of them will look like they have some sinister basis.
Or that Google Maps is primarily developed in Australia for a worldwide market.
Did the geo team get moved from Seattle?
It really doesn't. There are more things, Horatio.
As others noted, just because someone is black doesn't mean that they would have caught this. The whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.
> Facebook, like a lot of tech companies, has long had problems with diversity in engineering.
If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?
Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?
Strong agreement here. The impulse to attribute any mishap on anything race-adjacent to racism is one of the most destructive memes at the moment.
It forces a worldview where malice is the default assumption and encourages the "enemies all around us" mindset.
Another example: Apple Maps pronounces “Jai Ho” as “high hoe”. Apparently Apple has too many Latino engineers and not enough Indian engineers?
Maybe, but in the particular case you mentioned there is a specific word, "jai", that is pronounced as "high". See Jai Alai, which has been absorbed into English.
Given that the goal of racism is to structure society, and given how well that succeeded in America, I don't think it's unreasonable to ask whether it's at play in pretty much any situation where we see racially biased outcomes.
But it is an excellent question why Google Maps is still terrible at Indian place names even though they have plenty of people internally who not only could help, but would be delighted to. The answer to that will be essentially sociological. If you think that answer in no way includes structural inequity despite it being pervasive in America since its founding, you will have to explain how you think Google managed to eliminate that in the Maps division and then managed to re-introduce some sort of structure that leaves a wealth of internal knowledge untapped.
> structural inequity despite it being pervasive in America since its founding
America is not unique in this. And African-Americans are not the only people in the world who were enslaved. What is unique is that America and Americans are so good at controlling narratives and sucking oxygen out of rooms that other stories and catastrophes are forced into irrelevance.
America is not unique in this. But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.
> But America's history is uniquely relevant to the problems in America. Where Facebook is based and the tech industry is centered.
Sure it is. But if Facebook, Google and other American companies want to indulge their Americentric proclivities to the detriment of everyone else, they should voluntarily withdraw from the rest of the world.
> Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there
Silly Google TTS, the proper pronunciation is obviously "Malcolm the Tenth" there.
google maps is made in Australia, and the diversity there is different
Google Photos solved the problem by simply returning no results for words like gorilla, monkey, primate, etc.
I was just thinking about that. Unfortunately it just makes the bias harder to detect.
Once you search for these:
https://www.google.com/search?q=human+female+face&tbm=isch
https://www.google.com/search?q=human+male+face&tbm=isch
You can see that 'human face' has a bit of post-hoc tuning.
So disappointing. I was legitimately looking for a monkey pic I took years ago to no avail because of no searchability. One of the richest companies in the world prefers to just remove ability than to solve hard problems. But hey, at least we all get ads.
It’s an inevitable result of angry mobs (like this article and entire HN thread) and risk-intolerant corporations.
It’s impossible to test every image for accuracy and to guarantee it won’t happen again, so they just sidestep it entirely.
But what would you do if you were them? You solved 95% of the problem, you are left with 5% that are extremely hard. Would you throw a large amount of resources to solve that? And, given that you basically deal with probabilities and that the system will never work 100% anyway, and that even one mistake of that kind will cause uproar - is there any other feasible solution?
You can't just "test" a neural network like that. For all you know they tested a thousand pictures of Chimpanzees and Gorillas against the network, but for some reason the NN decided to classify the photo differently because the subject was standing in front of the wrong kind of tree or wearing a funny-colored hat.
There's no super reliable way to prevent this (with current tech) other than forbidding that output entirely.
Is it inexcusable that if I search 'Japan' to look for pics from my trip to Japan, it shows me pictures containing any Asian person at all? If I search Japan today, I get mostly pics of my not Japanese wife. But I guess we don't complain enough for anyone to care.
https://i.ibb.co/Mf6rVdf/Screenshot-20210907-002516-Photos.j...
Nobody who has traveled at all would mistake my wife and child as Japanese. And doing so is especially insidious considering the Bataan death march.
> Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?
They probably are, but not good enough. These things can be surprisingly hard to detect. Post hoc it is easy to see the bias, but it isn't so easy before you deploy the models.
If we take racial connotations out of it then we could say that the algorithm is doing quite well because it got the larger hierarchical class correct, primate. The algorithm doesn't know the racial connotations, it just knows the data and what metric you were seeking. BUT considering the racial and historical context this is NOT an acceptable answer (not even close).
I've made a few comments in the past about bias and how many machine learning people are deploying models without understanding them. This is what happens when you don't try to understand statistics and particularly long tail distributions. gumboshoes mentioned that Google just removed the primate type labels. That's a solution, but honestly not a great one (technically speaking). But this solution is far easier than technically fixing the problem (I'd wager that putting a strong loss penalty for misclassifiying a black person as an ape is not enough). If you follow the links from jcims then you might notice that a lot of those faces are white. Would it be all that surprising if Google trained from the FFHQ (Flickr) Dataset?[0] A dataset known to have a strong bias towards white faces. We actually saw that when Pulse[1] turned Obama white (do note that if you didn't know the left picture was a black person and who they were that this is a decent (key word) representation). So it is pretty likely that _some_ problems could simply be fixed by better datasets (This part of the LeCunn controversy last year).
Though datasets aren't the only problems here. ML can algorithmically highlight bias in datasets. Often research papers are metric hacking, or going for the highest accuracy that they can get[2]. This leaderboardism undermines some of the usage and often there's a disconnect between researchers and those in production. With large and complex datasets we might be targeting leaderboard scores until we have a sufficient accuracy on that dataset before we start focusing on bias on that dataset (or more often we, sadly, just move to a more complex dataset and start the whole process over again). There's not many people working on the biased aspects of ML systems (both in data bias and algorithmic bias), but as more people are putting these tools into production we're running into walls. Many of these people are not thinking about how these models are trained or the bias that they contain. They go to the leaderboard and pick the best pre-trained model and hit go, maybe tuning on their dataset. Tuning doesn't eliminate the bias in the pre-training (it can actually amplify it!). ~~Money~~Scale is NOT all you need, as GAMF often tries to sell. (or some try to sell augmentation as all you need)
These problems won't be solved without significant research into both data and algorithmic bias. They won't be solved until those in production also understand these principles and robust testing methods are created to find these biases. Until people understand that a good ImageNet (or even JFT-300M) score doesn't mean your model will generalize well to real world data (though there is a correlation).
So with that in mind, I'll make a prediction that rather than seeing fewer cases of these mistakes rather we're going to see more (I'd actually argue that there's a lot of this currently happening that you just don't see). The AI hype isn't dying down and more people are entering that don't want to learn the math. "Throw a neural net at it" is not and never will be the answer. Anyone saying that is selling snake oil.
I don't want people to think I'm anti-ML. In fact I'm a ML researcher. But there's a hard reality we need to face in our field. We've made a lot of progress in the last decade that is very exciting, but we've got a long way to go as well. We can't just have everyone focusing on leaderboard scores and expect to solve our problems.
[0] https://github.com/NVlabs/ffhq-dataset
[1] https://twitter.com/Chicken3gg/status/1274314622447820801
[2] https://twitter.com/emilymbender/status/1434874728682901507
>how are you not testing for this?
i wonder how testing for that looks and sounds in corporate environment. It may as well be an area similar to patents - you pretend that you never heard, never discussed, God forbid any mentioning in corporate email/chat/etc. or clicking on a link from inside a corporate network,...
Why are you so sure they aren't testing for it? Bias finds a way.
Curious if anyone on HN has built a testing framework to catch this kind of issue.
I've been trying to avoid controversy lately, but hey, here's one to downvote.
Have we considered AI and ML as a general brain replacement is a failed idea? That we humans feel we are so smart we can recreate or exceed millions of year evolution of a human brain?
I'd never call AI a waste, it's not. But getting it to do human things just may be.
Even a child can tell the difference between a human of any color and an ape. How many billions have been spent trying, and failing, to exceed the bar of the thoughts of a human child?
No it isn't a failed idea at all. The products out today are remarkably useful even if not perfect. I have tested out the google lens thing on google photos and it is astounding.
I took a photo of the water pump from a car windscreen wiper and google was able to correctly identify what it was. I took a photo of a generic PCB which showed the back of a driver board for an LCD and google was able to bring up the exact type of board it was with the names of the ICs on it.
In these examples, google photos ai has far exceeded what the average human can achieve. We just have to keep in mind that these systems are not perfect and only a best guess which should be verified by a person later.
The problem here is not that the mistake was very costly or disruptive to the function of the feature, but that the mistake was highly offensive which is something very hard to avoid.
The problems it solved for you are immensely useful to you, but not remarkable IMO.
The problem it's solving is that it can do things that somebody with zero experience cannot. If you had an auto parts pro, or an EE, they probably could have done the same for you.
So, in general, AI is helpful because it has a much larger breadth of knowledge. Granted.
But I want examples of it doing depth, too.
My wife uses Lens when we fish. It's way, way worse than a fisherman with any experience at all.
Regardless, it is still far beyond a "failed idea" since it provides genuine value and achieves at a higher level than the average human for many topics. It isn't as you say better than an expert but the fact that for free I get something that works well is remarkable to me considering this is cutting edge technology.
Please read my premise again. I only called it a failed idea in the sense it could act as well as a human brain.
I said it's not a waste. Not at all, I use it in a lot of the ways you describe.
> Have we considered AI and ML as a general brain replacement is a failed idea?
Yes. It is currently known to fail at this prospect. It is an open research question as to whether current methods can be merely "scaled up" using more compute to achieve "general brain replacement". I personally am skeptical about that considering basic problems such as concept drift (but I am by no means an expert).
You define what constitutes as valuable to be arbitrarily difficult/inconceivable with current methods (because it's an area of open research) and then say we should divert course merely because we don't know it's possible?
> never call AI a waste, it's not. But getting it to do human things just may be.
It already can do things thought to be previously exclusively "human" (such as beating Go). Recently it also helped make significant advancements for protein folding which are sure to yield benefits to medical science at least indirectly. I believe this statement is either incorrect, or you're expecting people to have some strange definition of "exclusively human", which is of course also open research and unanswered.
Couldn’t you apply the same way of thinking to finding a cure for AIDS, or doing interstellar travel, or P = NP, or pretty much any problem that we haven’t solved yet? Just because we can’t solve a problem within our lifetime doesn’t mean it’s not solvable at all. This is one of the most basic principles by which knowledge, and therefore, technology, progresses.
Not at all. If a child could solve the AIDS issue and science couldn't, then maybe.
Humans and machines are so different today. Of course machines beat us at number calculations and such. But we have organs that computers don't and can't have. And our brains are much more in tune with using those than power of 2 bit twiddling.
As we ourselves don't understand how it works, how can we ever write a machine that does?
Well how do you know that the image recognition error is a fault of the ML algorithm (because we can’t capture how organic minds learn, as you are suggesting) and not of the learning sample?
Given the complexity of the solutions employed in this space and the task we’re trying to get them to solve (or perhaps, the solutions we’re looking for problems to) I’m not that surprised.
Taken to the extreme, AI code is essentially something like:
In addition, being tested with a (in relation to the complete set) very small set of input data.add(M, N) { return M + N + rand(); }
Is that a result of a skewed training set or are people really hard to tell apart from gorillas if there are no obvious tells like large difference in brightness of different areas of the face?
Deep learning, for all its recent glories, still suffers from relatively crude, slow-converging training algorithms compared to other areas of ML and statistics.
Maybe to your typical SGD-type algorithm, working off a dataset filled with mostly light skin toned people, skin tone just looks like a real solid first-order way to distinguish humans and primates, and picking up the black people / primate distinction seems much more marginal and second-order, in terms of impact on the cost function.
If most of the people in the dataset were black, I predict you wouldn't see this.
Consider too what they are likely using for inputs: photos with associated comments.
I don't know Facebook's TOS sufficiently to know whether they are using private groups as source material, but if you're utilizing bigoted content to train pattern recognition, you will replicate bigoted content.
I wonder why this was downvoted. It's an interesting hypothesis.
My guess is that the poster was making an assumption that a large part of facebook's images are bigoted content. I am neither agreeing or disagreeing. But apparently some people got a little emotional about the platform being associated with maybe having a heightened amount of bigot content.
Not necessarily a large part, simply enough to identify as its own pattern.
In my experience there are a lot of bigoted things on Facebook. If these are serving as source data, and are sufficiently distinguished from other training material, it may well be user behavior the ML system would replicate.
Gorillas have black and human-like faces. I think they're just quite similar so AI can more easily confuse them.
I’m fairly certain if you showed pictures of both groups to a toddler they’d be able to sort them correctly. It’s really not hard for a human to tell the difference. Which tells me that FB’s AI isn’t really that great.
I saw another article which included a screenshot of the mistake on facebook. The photo is blurry, and contains a side view of the person and iirc the backdrop is in nature. The main indicator that it was a person was the fact they were wearing clothing. Clearly the AI is not good enough, but I will admit that the data it was working on is tricky to achieve perfect accuracy.
The article I saw with the screenshot was from NYTimes:
https://www.nytimes.com/2021/09/03/technology/facebook-ai-ra...
Human-like really depends on your interpretation. That's a generous reading of what's going on. If you google Gorilla faces, I don't think you would be confused.
The AI is not that smart and these examples show it.
Us humans are super good at distinguishing faces. So what's obviously different to us might not be so clear to an AI or another species.
>Us humans are super good at distinguishing faces.
It would be interesting to test a bunch of midwesterners at their ability to tell Asians apart or to be able to distinguish various Asian ethnicities. My guess is that a lot of the distinguishing features that they look for are altered or missing.
And while that is probably true for most of us and gorilla faces, even those midwesterners would easily distinguish an Asian person from a Gorilla.
It's true that we're good at recognizing faces (even where there are none), and distinguishing on a basic level (type of animal) but specific faces are mostly cultural.
Us humans can tell the difference between a human eye and a gorilla eye 100% of the time because it's incredibly easy.
What a silly claim. If there was a way we could easily put money on that, I'd have no trouble finding an example to prove you wrong.
Is it even a confusion?
Humans are primates. It's weird that it selected such a broad label, but it didn't select an incorrect label.
"Human supremacist" attitudes are incredibly common. Look at any discussion of animal intelligence and you'll see the most vehement denials of any possibility that our cognition and emotions aren't unique in the world.
Now you've made me wonder what it thinks about albino gorillas.
e: I assume something similar has been done before by training a model on brown/black bears then throwing polar bears at it. Anyone know the outcome?
I would be interested to know if young children would do much better in many cases.
When I was quite young, I referred to some firefighters as robots.
It reminds me of some of the explorers' tales of people who were half-human, half some other animal, or of people covered in hair, the first of which may have originated from seeing people riding animals, and the others to various (actual) primates. If humans can make such mistakes, certainly Facebook's AI can be excused for its confusion.
Apparently we are not sure whether the original term Gorillai referred to people or gorillas.
Are you sure those were honest mistakes and not stories for the sake of storytelling? And no, prehistoric ignorance does not justify this system making it to production.
the biggest tell is probably that, unless the person is in a zoo, I haven't seen many gorillas at starbucks
which says a lot about the state of our alleged human outperforming AI
I don't know. I think I would not have a hard time picking out a gorilla at Starbucks.
And I'd like to see a gorilla in any pose that's really hard (for a human) to differentiate from a person.
The truth is: the recognition algorithm is not very sophisticated after all.
There is evidence that CNN's use texture features more than shape features, i.e. have a texture bias. It's hard to tell in this case without access to the data/model, but it's very possible colour is being overvalued by the classifier and causes the errors.
Isn’t the latter a possible consequence of the former?
Skewed training set, very few dark skinned human faces.
Got a source for that?
I’m not an expert in this field. What are the likely root causes for this happening?
The most obvious answer that no one wants to mention is that there just genuinely is some similarity between the two categories which is stronger than the similarity between others.
While I don't have a source, it seems clear that with sufficient training data they could do better at avoiding this mistake. This should not be hard.
Perfect accuracy in image classification is an unsolved problem. So yea, it's hard.
If Google and Facebook have both failed with the billions they have spent on the problem, I think we can say yes, it is that hard.
Obviously the people at google had thought of having more training data.
The training set had more primates than dark skinned humans? Unlikely.
The video features white and black men. It seems like concluding the algorithm is calling black men primates is the same kind of error people are accusing the algorithm/Facebook of. i.e. The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.
Primates and humans are similar labels. This was almost certainly not intentional. Video classifiers are going to make mistakes - sometimes crude or offensive ones. I don't get outrage over labeling errors like this. Facebook should fix the issue - but they shouldn't apologize. It only encourages grievance seekers.
We’re assuming it because that’s exactly what has happened with other products in the past. It’s an issue the field has struggled with, so it seems likely.
Maybe I'm not aware of what you're referring to, but I don't think so. I think, like this incident, companies apologize for stuff like this because they lack the courage to say the truth, which is that it's an unfortunate labeling error but not a big deal. Instead, they judge it to be more political to beg forgiveness. Of course, the people who get offended by labeling errors are only encouraged by apologies and use them as evidence of wrongdoing.
Intentional or not, the outcome is all that matters.
In every aspect of your life
I don't know if this is necessarily true. We have separate charges for murder and manslaughter for example.
The end outcome and impact on everyone (in your case, the deceased, the family) depends on intent.
Doesn’t change the original statement I made one bit
Speak for yourself. Intent is one of the key factors in crime investigations. Even in 'every aspect of life', intent plays a critical role in greasing the society's abrasion. It helps us understand each other better. Did you accidentally bumped me or did you try to push me out of the bus?
Intent can and will influence the outcome.
Does not change anything about my original statement.
The end outcome of “oops I bumped into you” is basically nothing. The end outcome of an intentional shove is quite different (maybe lack of trust develops, etc)
The end outcome of punching someone in the face while drunk and saying “hey brah I didn’t mean to” is still gonna feel shitty no matter the intent
> The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.
No, I think it's racist because racists have a long history of calling black people primates, and because an automated system doesn't get to escape scrutiny and critique just because someone didn't specifically put in a line of code that emulates the actions of racists.
This happens because there are no black people of consequence in the ML pipeline. In my previous company Everytime we built a new model, a bunch of us would test it. Being the only black person in the company, I often found some very odd things and we would correct it before shipping.
I understand that fb is a much bigger scale, but all the reason to have a much more diverse set of eyes to test their models before they go live.
If you want to avoid this, hire more black people, seriously.
"Hire more black people" - isn't that what FAANG desperately been trying to do for some time now?
I guess first step might be to "hire more black QA people".
They already get hired, when they meet the hiring bar.
It's not obvious to me what black people would have done to fix this specific problem. Would they have said "oh we should make sure to test the algorithm on blurry images of people in a forest and make sure it doesn't get confused"?
"I uploaded my picture and it says I'm a monkey"
"Oh, maybe we should look into that"
What I don't get is that people seriously think that FB/Google don't test with images of black people? Especially after one of such mislabeling scandal happened already? Very highly doubtful.
I worked for another computer vision company, Clarifai that had the same issue. One of the employees noticed it and we retrained the model before it became public.
This is what amazes me. Given this exact thing has happened in the past and resulted in public humiliation of the companies involved, how did they not notice this? Why didn’t they check for it?
What’s being reported is that there is a single video which is mislabeled. For all we know, they did test for this, and believed there was no issue.
AI models are deterministic in a purely technical sense, but practically speaking, they are non-deterministic black boxes. It’s not as if you can write a unit test which generates all possible videos of black people and makes sure it never outputs “gorilla”.
Checking would require checking every possible video against the classifier.
I think the negative reaction is reasonable. Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing? The fact that it is unintentional doesn't negate the fact that it's an embarrassing mistake.
On the other hand, imagine a world where these labels were applied by a massive team of humans instead of a deep learning algorithm. At Facebook's scale, would the photos end up with more or less racist labels on average over time? My guess is that the model does a better job, but this is just another example of why we should be wary about trusting ML systems with important work.
Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing?
One worries that the corporate overlords are preparing the legal system for completely impune manufacturers of self-driving cars. "Sorry your child is dead; the car did it so there's no one to sue or convict."
That raises the question, is it embarrassing or an expected mistake to be learned from. Many things are mislabel many things are labelled properly but we never say AI must feel pride at the good labeling job why would we give emotions to an emotionless system?
> it embarrassing or an expected mistake to be learned from.
I would say it's both. It's embarrassing for Facebook because it looks racist even though it really isn't. The system might be emotionless but the people who interact with it aren't, and we don't expect them to be.
> it looks racist even though it really isn't
It absolutely is racist. Racist outcomes are still racist regardless of whether there's a guy in Klansmen robes at the steering wheel or not.
How is it a racist outcome? This has nothing to do with the belief that one "race" is inherently better than another. It's a simple categorization error due to insufficient training data.
It's a racist outcome because racists have a long history of comparing black people to primates, and because this results in a service that's actively worse and less useful for black people than those of other ethnicities.
Also, a 'simple error' performed by company with absurd amounts of money and several extremely public examples from its peer companies as to what not to do is, at that point, more negligence than anything.
> Racist outcomes are still racist regardless of whether there's a guy in Klansmen robes at the steering wheel or not.
Yes, I understand what systemic racism and implicit bias are, your condescending snark is appreciated.
Anyway, it's not racist because the result is not the product of implicit bias or systemic racism, it's a software bug that would have been possible no matter who was working on this software. As I wrote in another comment: the whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.
AI is not the problem here. AI just notices stuff. It's the lack of even amateur hour emotional intelligence in the product managers who deploy systems like this IMO.
I don't like these stories. It always trends towards the most inflammatory arguments, those being inherint bias and unconscious racism put upon our technology. Real issues in those topics aside, are any articles like this doing anything but feeding flames and generating ad revenue?
Instead, I want to talk about pareidolia. Humans are social creatures. We have evolved to identify others of our kind and read their expressions. This was important to us, as we evolved alongside gorilla analogues as well, and the few of us that couldn't discern one face from another didn't usually last long.
I think we're trying to place too much of a human expectation onto these machines. I think that human features and primate features are strikingly similar, and it's our specialized brains that let us so easily discern. Yes, with enough data and training we could have more accurate models, but we can't cry foul everytime an algorithm doesn't behave like a human does.
Reference: https://www.reddit.com/r/Pareidolia/
The thing is that humans are generally excellent at these sorts of pattern recognition, but these networks aren't nearly as good. Even in rigorously trained networks that operate surprisingly well, mistakes will appear that if made by a human would be treated as due to carelessness or stupidity.
So, this is going to happen.
> The thing is that humans are generally excellent at these sorts of pattern recognition
Humans with a lot of experience are. Would kids be? I once referred to firefighter as robots as a kid.
Our ability for pattern recognition w/ human faces developed over the course of many thousands of years. People in modern fire fighting gear weren't present through that process. A kid thinking a firefighter is a robot is not the same class of problem. Even kids are good at the type of tasks we're talking about.
People have difficulty distinguishing the individual faces of other races.
https://en.wikipedia.org/wiki/Cross-race_effect
So at some level it breaks down for us too.
Yes and I think its fair to say that a firefighter in full gear is a reasonable place for it to breakdown, and that does does not otherwise indicate a failing in human capability in this area that should make our failure modes in any way whatsoever at the same level or rate as computer models.
The cross-race effect is an instance of the ingroup advantage. However, that can't be extended to say that people will classify blacks as primates.
I would say that most kids would not make the same sort of mistake as this network is reported to have made.
Something grotesque was put forth by a technology made by an entity that is comically monied. A mistake was similarly made by another monied entity only months ago, so it should have dedicated considerable effort to prevent such things. This is the way this works, we hold higher expectations out of the ones who have resources.
Please do not trivialize acts that have the potential to cut humans so deep with handwavy substantiations. Facebook should have known better, and done better.
By the sound of it this is a problem that neither entities nor their moniedness can solve. I'm sure these companies watch each other, and when one steps on a metaphorical rake the others are likely taking notes on how to avoid it on their future attempt. And yet rakes are still being stepped on.
When you have an automated system that has irregular behavior to a given input, we call that a bug. Bugs exist in all software, not always unique, but always present. This software is no different than any other. It will have errors. Because the software is categorizing faces, its errors will result in miscategorizing them. The only relevant questions to this are how frequent these errors are and how disparate they are across racial lines.
Another reference: this one is a Tool-Assisted Speedrun of a game that relies on basic image recognition software. While not entirely related, it does show how error-prone these algorithms can be. It's also fun to watch. https://youtu.be/mSFHKAvTGNk
> I don't like these stories.
Nobody likes the stories. No reasonable person is celebrating them. You’re not in disagreement with anyone.
You put a strong focus on how we evolved to deeply care about small facial expression differences and face features to identify and interact with an individual.
These stories are about how we also deeply care about labels and categorization. Aren't we just looking at the natural selection (making them not "last long") of these way too rough AIs that step on bounderies that are pretty important to a lot of people ?
Ha! I like that! Yes, I guess in a way we are. These models are always being evolved in their own version of 'natural' selection. They go through tens of thousands of mutations before finding one that guesses well enough to be pushed to production. This is just another stage of that algorithms life cycle I suppose. If you want to take an optimistic view of it this is just another part of the tuning process. The AI can train for as long as it likes, but the real thing it's being weighted against is public outcry.
>I don't like these stories. It always trends towards the most inflammatory arguments, those being inherint bias and unconscious racism put upon our technology.
Oh well, it's the times we live in.
If people simply laughed at the results and fixed the problems they'd miss all the endorphin rush of outrage.
I think your comment is a bit dismissive. Facebook is not the first to encounter this, it happened 6/7 years ago and they should have known better. Secondly, if the Data Scientist working on this were all black, this would not have happened, just like the automatic soap dispensers in bathrooms.
Why would that be true? I don't understand the argument that black people would have avoided this.
From what I can tell the only fix here is a hardcoded workaround outside the net, or a substantially more powerful architecture.
What’s this about soap dispensers?
FB had a soap dispenser that didn't recognize black people.
https://gizmodo.com/why-cant-this-soap-dispenser-identify-da...
When the video comes up, facebook displays a message that says it is "false information".
(And when you click "why" you get a picture of Arabic text, which can't be copy/pasted into translation software)
Shame on you for distributing false information, it's a good thing we have facebook protecting The Truth. /s
The link says that it's partly false because it's an infrared sensor, that doesn't detect any skin color and isn't biased by virtue of not really making any decision. It just dispenses soap when the infrared sensor gets triggered. The problem is that according to the article black skin does not reflect infrared radiation very well (no idea if that's true, but that's the claim here) meaning it's more of a physical limitation than a "defect" as can be argued in the case of AI models.
But the article also says that a counterargument could be that the existence of machines that aren't very suited to a big part of the population can be seen as proof of some latent Racism (to be more accurate, discrimination is closer to what's used in the article) whether intentional or not.
That’s all interesting and you make good points but…
I think the conversation can be made a lot simpler.
AI isn’t ready for anything important. Done. That’s it. If one of the pioneers in the field can’t determine black peoples from primates - it isn’t ready for driving or war or legal matters or really anything of importance.
I think we (colloquial) made something kinda cool and jumped the gun on when and where to use it.
Humans are primates. The AI is correct. Does it classify white men and Asians as primates too? If not, that's a bug.
I wonder if AIs are good at distinguishing individual gorillas, etc. I'd never really thought about the problem of classification being harder (perhaps) than identification if you see what I mean.
Google: hold my beer
ooof, thats uncomfortable
nothing important in the world should RELY on a AI/nn/ml.
This feature was not of any importance. It just asks the viewer if they want to see more or less of a certain kind of video.
yes, and now imagine if the feature was important. like sorting job applications, or medical diagnosis, or (dramatic pause) driving. Lots of organizations are looking at ways of completely removing humans from their decision making processes. 99% certainty rates would be fantastic unless you are in the 1% false positive/negative group
Wow, what else? Did it also label them as "Homo Sapiens"?
I feel that 0-failure-rate expectations from technology will keep us from progressing as a species.
Facebook disabled Thai-to-English translation back in April because it translated the queen as “slut” and it’s been disabled since.
Maybe we should learn to accept non-fatal errors from applications instead of forcing things to stop entirely.
I find it ridiculous that my Photos app suggests I change monkey to “lemur” while I have plenty of photos of monkeys and zero of lemurs.
Who takes the fall when an AI screws up?
If you shine enough light on it, apparently the brand does. If a human were to do this, the company would immediately fire the employee and cut all ties with them. But as the article points out, 'fixing' an AI mistake isn't really a fix at all:
> [Google] said it was "appalled and genuinely sorry", though its fix, Wired reported in 2018, was simply to censor photo searches and tags for the word "gorilla".
If a human did it with intent to offend, they would. But if a black person genuinely looked like a primate to human eyes, that would be pretty shitty to fire the poor worker who had no way of knowing. Here, the AI isn't trying to offend so maybe there should be no consequences and people should stop demanding severe punishment for minor accidental insults.
The AI is very honest and innocent, it doesn't know what political correctness is. I've heard stories of parents whose kids would also mislabel a black human as a gorilla.
People need to learn to cope with the difference between innocent mistakes and expressions of genuine feelings of contempt/etc.
An 'innocent mistake' performed by a megacorporation with bajillions of dollars, of exactly the same kind of error that has publicly appeared in the actions of other megacorporations with bajillions of dollars... at that point it's not a 'mistake', it's just the people in charge not caring.
Unfortunately, this is the sort of article that just causes all the SJWs to come crawling out and destroy any attempts at logical reasoning.
AI doesn’t build itself in a vaccuum. The very people who train it are likely to be biased in one direction or another and if left unchecked their mastetpiece will be biased as well. Now, if these algos were staying in a lab or something it wouldn’t be a problem but as soon as they hit the real world they should be held up to some standards, don’t you think?
I don't really think the world needs AI right now. One can argue that the AI is making an innocent mistake and that calling an AI or ML (or it's improper training, however that works) "racist" is overblown rhetoric as people are here, but I think all of that aschews the actual issue. The problem is that AI and ML are primarily used for decision making, like in recommendation engines. These little gadgets that provide recommendations may be fairly low-stakes, but are theoretically proof-of-concepts for future applications like policing, fighting terrorism, or human trafficking. If you get it wrong there, the consequences are devastating. If people don't raise the flag about how wildly wrong the AI is now, then there will inevitably be a false confidence to use it for the aforementioned applications (and there are plenty of examples of how this has already happened).
Maybe the algo or the training set or something else was racist, maybe it wasn't. But if you code something that labels people slurs, you've messed something up. Like, you need to be 99.999999% sure you're not throwing out slurs or your whole project is failing spectacularly. And then you have to apologize to the 0.0000001% , which is still probably like 10 people if half the planet uses your site. How do you get there? I don't know. I guess it'd help if you could be 99.999999% sure you weren't looking at a human face before using another label. Like, bias towards humans in a big big way. Heck, the pre-test probability that your algo is looking at a person is probably much higher than the one from your training set if you're facebook. Or maybe you drop primates from your training set. I guess in that case you'll misidentify some primates as people-- which is kind of the flipside of the same problem technically but oh so much more acceptable.
This isn't the kind of slur that you can just run a dictionary search for. There are totally valid contexts to tag a gorilla in a picture if it contains gorillas. I'm sure there are other words it could also mistakenly classify with that might be insulting on accident but arent slurs (maybe tagging an athlete as a statue, for instance, like "that quarterback is a statue in the pocket"). This tech isn't perfect so you either need a human editor or you have to learn to live with mistakes. IMO the fact that this was unintentional and an AI mistake makes me think the outrage is more performative than genuine.
Oh boy. People who know basic ML think "Oh, it was unintentional, just a basic misclassification, it happens." Guess what? You're still calling people slurs on your website, even if you did it accidentally.
I'm not saying it's not bad, but I think as adults we should all be able to differentiate between an accidental insult and an intentional one.