Part of my code makes Copilot crash

282 points by Tree1993 4 years ago · 410 comments (376 loaded)

Reader

fny 4 years ago

So I've just tested it, and I can confirm, yes, copilot refuses to give suggestions related to gender. Now I know a lot of people are calling this absurd, but looking more closely, there are two PR nightmare scenarios.

1. Copilot makes a suggestion that implies gender is binary, a certain community explodes with anger and an entire news hype cycle starts about how Microsoft is enforcing views on gender with code.

2. Copilot makes a suggestion that implies gender is nonbinary, a certain community explodes with anger and an entire news hype cycle starts...

You can't win... so why not plea the fifth?

To all those claiming this is an example of "wokeism", remember the proper response from an individual who believes in nonbinary gender would be to offer suggestions of the sort. There is no advocacy here. Mums the word.

onionisafruit 4 years ago

Those aren’t the only options. You can just let it suggest what it is going to suggest. Copilot is a product for adults who should be able to comprehend what machine learning is. Anybody who throws a fit about it will only be exposing themselves as a fool.
- eru 4 years ago
  
  I might even share your idea about adults _should_ behave. But that doesn't invalidate fny's musings based on how _adults_ do behave.
  - sbr464 4 years ago
    
    I would love to see a origin/latin etc breakdown of the word behave. One of my least favorite words (authority issues much? Yes).
    
    majou 4 years ago
    
    https://www.etymonline.com/word/behave#etymonline_v_8255
- alasdair_ 4 years ago
  
  The problem is that if you train an ML model with a bunch of data that happened to be available in the past, then the system will perpetuate the same biases as were inherent in the training data. This leads to the (real issue) Google image classifier categorizing an image of a black man as a "gorilla" etc.
  Certain words are heavily loaded and are worth just skipping to avoid all the hassle for now.
  - eru 4 years ago
    
    Btw, the gorilla incident was overblown. Overblown in the sense that people from other races (including whites) were also classified as some hilarious animals.
    Gorilla and black just was the most politically charged one of the bunch.
    (The other potentially politically charged one was some tendency to misclassify people of various levels of body fats as various animals.)
    > Certain words are heavily loaded and are worth just skipping to avoid all the hassle for now.
    If memory serves right, that was Google's pragmatic solution: if they detected a human in the picture, they 'manually' suppressed the animal classification.
    So they lost being able to classify 'Bob and his dog' in return for not accidentally classifying a picture of just Alice as a picture of a seal.
- lifthrasiir 4 years ago
  
  > Copilot is a product for adults [...]
  If you didn't meant "should be" (for which I'm not willing to take any position), no, Copilot is not a product for adults [1] [2].
  [1] https://docs.github.com/en/site-policy/github-terms/github-t... "A User must be at least 13 years of age."
  [2] https://docs.github.com/en/site-policy/github-terms/github-t...
  - jokethrowaway 4 years ago
    
    I'm sure the commenter didn't mean adults as legally adults but as someone that understand what machine learning is and won't throw a fit if the computer says something he disagrees with.
    A 13 years old is perfectly capable of that, I know many 40 years old that aren't.
  - staticassertion 4 years ago
    
    A minimum age for accepting terms of use isn't the same thing as a target demographic.
    
    lifthrasiir 4 years ago
    
    No, but the minimum age requirement affects the handling of questionable contents, which can never be "doing nothing" as the GP suggested.
    
    staticassertion 4 years ago
    
    Fair point.
- smsm42 4 years ago
  
  That implies corporations are ruled by adults that aren't confusing twitter with the real world and aren't afraid to tell the screeching activists to leave them alone. Nothing we've seen in the latest decade suggests it is even close to being the case.
- jhanschoo 4 years ago
  
  Not every country where Microsoft is doing business in has the same mores as the western world.
- benhurmarcel 4 years ago
  
  https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot...
  - oars 4 years ago
    
    Didn't know about this. Thank you for making my day. (Disclosure: I used to work at Microsoft)
- classified 4 years ago
  
  Most humans are fools. And you'll get a lot of flak if they think you stepped on their toes.
kodah 4 years ago

Agreed. The answer is approved by Dave Cheney, he works at GitHub, and if you've ever attended one of his talks it's plain to see he's a very scrupulous person. I also don't think this is an example of Microsoft taking a side; rather I read it as them refusing to bat, which seems fine.
What I would've preferred one of these threads to be about is how all of this works. Like, how do they post-hoc filter certain things? Is that the only way to deal with things defined as issues in ML?
- duskwuff 4 years ago
  
  Making Copilot stop in its tracks when it sees the word "gender" and refuse to continue until the word is removed is still making a statement. Refusing to bat would be treating "gender" as a meaningless token, just as if you'd typed "traqre" instead.
  - mlyle 4 years ago
    
    No, refusing to generate stuff in an area where the output is likely to be controversial (in either direction) is refusing to bat. It'll wait for a pitch that it thinks it can hit, just like it refuses to play for many other categories-- you'll have hard time to get Copilot to enumerate races, too.
    Ignoring the potential offensiveness and YOLOing through it is swinging the bat wildly at every pitch.
    
    duskwuff 4 years ago
    
    I think you might not fully grasp the scope of the issue here. Right now, if a file you're editing contains one of the restricted words, Copilot will refuse to make any suggestions at all in that file while that word is present -- even if the word isn't relevant to the part of the file you're editing. To keep to the baseball metaphor, Copilot is going on strike at the first whiff of controversy.
    What I'm suggesting is that Copilot should keep working when these words are present, but refuse to attach any significance to the specific word. This could probably be implemented by replacing the problematic words with randomly generated strings before processing the text, then swapping those strings back afterwards.
    (It could be reasonable for Copilot to refuse to make suggestions at all if the output would contain truly offensive language, like unambiguous racial slurs or sexual terms. But "gender" clearly isn't that.)
    
    mlyle 4 years ago
    
    > Right now, if a file you're editing contains one of the restricted words, Copilot will refuse to make any suggestions at all in that file while that word is present -- even if the word isn't relevant to the part of the file you're editing.
    Yah-- it's unfortunate but it's easy. It might be OK to tolerate it if it's clearly outside the range of tokens used in suggestions, but the filtering doesn't use tokenized stuff.
    > What I'm suggesting is that Copilot should keep working when these words are present, but refuse to attach any significance to the specific word. This could probably be implemented by replacing the problematic words with randomly generated strings before processing the text, then swapping those strings back afterwards.
    The problem is, the trained model is much smarter than the keyword-based filtering. If you just whiteout the watchwords, it still has a pretty good chance of gleaning context and making a commentary on gender that Microsoft would rather not deal with.
    > (It could be reasonable for Copilot to refuse to make suggestions at all if the output would contain truly offensive language, like unambiguous racial slurs or sexual terms. But "gender" clearly isn't that.)
    Right now the list is quite a large variety of things. Mostly racial slurs and sexual terms. But letting an AI ramble on after "blacks" is kind of dangerous, as are various gender-related terms that do have innocuous interpretations. It's easy to put words in the filter list and much harder to try and use nuance on these topics that even humans struggle with nuance around.
  - faeriechangling 4 years ago
    
    Yes but people haven't quite figured out WHY people should be offended at Microsoft for doing this, so it's quite convenient for them before people discover their reasons for being mad.
- zorpner 4 years ago
  
  > I also don't think this is an example of Microsoft taking a side; rather I read it as them refusing to bat, which seems fine.
  You can't be neutral on a moving train, as they say.
captainmuon 4 years ago

I don't get the whole discussion. There are just many different models of gender. Its like particles vs waves. In one model, there are only two genders, in another five. There are those who say gender is culture and sex is real, and those who say sex is constructed, too. Some models describe reality better than others, some are useful, some are harmful. But nobody can or should stop you from thinking about reality with the model of your choice.
If I were Microsoft, I would post a shrugie and say copilot offers arbitrary responses based on the actual code it reads; it is not supposed to be "correct" or good or fair, but just follow what it sees other people do.
- xupybd 4 years ago
  
  >>Nobody can or should stop you from thinking about reality with the model of your choice.
  While I agree with you, that is very much the game that is being played here. We have competing world views and one way to help a world view dominate is to play a linguistic war. That was the point of Newspeak in 1984 (https://en.wikipedia.org/wiki/Newspeak). If you control the language such that competing ideas are instantly taboo just by the words required to describe them you can stop people from promulgating those ideas. So you gain ground without ever having to debate the new ideas.
  This has happened in many countries when one religion dominated. Western society was starting to get to the point where it taboos were being shed and ideas could win based on their merit. Sadly we're regressing back to a society controlled by dogma rather than an open exchange of ideas. I suspect this is the normal state of human societies, we fluctuate between open and closed societies.
  - WastingMyTime89 4 years ago
    
    > Western society
    The problem seems fairly limited to the USA from where I stand.
    
    xupybd 4 years ago
    
    I'm in NZ and it's very much here as well.
  - indigo945 4 years ago
    
    The whole point of (post-)structuralist philosophy, which informs the left-wing view on this, is that all language is already newspeak. (And since it follows that you can't choose not to play, you may as well play to win.)
    > Western society was starting to get to the point where it taboos were being shed and ideas could win based on their merit.
    Name a year that things were actually better. (In the 40's, before the civil rights movement? In the 80's, when queer people were still regularly oppressed and excluded from participation in society? Did ideas "win on their merit" when police beat up people in gay bars?)
    
    xupybd 4 years ago
    
    >The whole point of (post-)structuralist philosophy, which informs the left-wing view on this, is that all language is already newspeak. (And since it follows that you can't choose not to play, you may as well play to win.)
    Exactly that. This is the current conflict.
    Things were not better in the 40s, action movies were better in the 80s :p.
    Things were trending towards a more open society they had not become perfect by any stretch of the imagination. That trend, IMO, has reversed, due to the tactics involved. That is not to say some groups haven't benefited from this. There is a genuine drive to create a utopia here. However I fear the cure might be worse than the disease.
- pyrale 4 years ago
  
  > If I were Microsoft, I would post a shrugie and say copilot offers arbitrary responses based on the actual code it reads; it is not supposed to be "correct" or good or fair, but just follow what it sees other people do.
  The last time Microsoft did that, they ended up with their bot posting racist content on twitter. They of all people understand that just following what people do on the internet is a recipe for disaster.
- q-big 4 years ago
  
  > Some models describe reality better than others, some are useful, some are harmful.
  The idea of science is to get rid of models that are wrong.
- bergenty 4 years ago
  
  It’s really not some complicated multiverse of possibilities. It’s biological, very factual and the underlying genetic basis is as objective as something can get.
  - kmonsen 4 years ago
    
    There are actually corner cases here, although that’s not what usually comes up: https://en.wikipedia.org/wiki/Intersex
    Just a reminder that often reality is more complicated than we think. Names, numbers and upper/lower case are the usual examples.
    
    q-big 4 years ago
    
    > There are actually corner cases here, although that’s not what usually comes up: https://en.wikipedia.org/wiki/Intersex
    No biologist would claim that the sex is constructed.
philipswood 4 years ago

Choosing door 3 unfortunately leads to ...
A certain community explodes with anger since their machine learning dev-tooling is closed and has arbitrary restrictions.
If you try to please everybody, someone won't like it.
- woojoo666 4 years ago
  
  Unfortunately the people that care (like HN people) are less likely to spend time organizing protests and riling up an internet mob
  - xupybd 4 years ago
    
    I'd argue it's fortunate. While it does seem like a good idea to get out there and promote your sides view of thing, I suspect the best option is to excel in your life and rise to influence.
    My hope is that the people here, at least the level headed ones, will rise to positions of influence and not the people rioting at every chance.
bryanrasmussen 4 years ago

I'm going to have to say it is ridiculous because there are all sorts of things that cause problems that the copilot generated code is going to have to keep out following this reasoning -
let's not handle ethnicity, if we're going to be sensitive about gender that is an area which is also sensitive for many people.
should it take border disputes etc. into consideration, if you're using it in country X and country X thinks a particular area belongs to them despite most of the world disagreeing will you not be able to use copilot to generate code that supports your remote employers international operations?
it would make better sense if Copilot had warnings it could issue and when you wanted gender put up some sort of warning about that - or allow you to choose binary gender / multi gender solutions.
The idea that it should fail, and that makes sense for it to do so is essentially a critique of the whole code generation idea.
on edit: obviously HN should be able to come up with lots of other things that might cause media related problems if CoPilot handled it, code to detect slurs, etc. etc.
nonethewiser 4 years ago

The nightmare scenario is caving to either mob. There is no good reason to moderate this.
coffeeblack 4 years ago

It’s just following the old advice not to talk about religion.
wseqyrku 4 years ago

This is similar to the stupid branch rename saga. It is certainly pointless, but not doing it could be disastrous.
hjkl0 4 years ago
> Copilot makes a suggestion that implies gender is binary
How would that work though? What can Copilot suggest that can imply that?
```
  If gender is true 
     Do something…
  Else if gender is not true
     Do something else
  Else
      Do nothing
```
xupybd 4 years ago

There is a safe version of gender. Grammatical gender is, for now, binary and as far as I'm aware not offensive to most.
But I agree you can't avoid offending people. The world is nuts everything is offensive to someone.
- texaslonghorn5 4 years ago
  
  Grammatical gender is not as simple/uniform as you state https://en.m.wikipedia.org/wiki/List_of_languages_by_type_of...
  - xupybd 4 years ago
    
    Thank you, I stand corrected.
q-big 4 years ago

Solution: let the user choose their political stance on such a polarized topic in the Copilot settings so that the user gets suggestions that fit his stance.
poulpy123 4 years ago

The solution is conceptually simple (no idea of practicality): propose an answer related to the context.
And also: give the list of banned words
asojfdowgh 4 years ago

its only a PR nightmare because its a closed service and not an open tool
TeeMassive 4 years ago

Pick 95% of your users, not a hard choice.
- aaomidi 4 years ago
  
  They have. 95% don’t give a shit tbh :)
gloosx 4 years ago

It's a total nonsense, how can someone be angry at a soulless machine? Is it a real thing to face anger towards an AI like it was a real human? It's a serious mental problem then, cause the anger is actually directed inward in this case
- TheDong 4 years ago
  
  The anger is clearly not at the "soulless machine", but at the people and corporation that built, trained, and tend to it. The parent comment did not say "the community explodes with anger [at copilot]", they just said "with anger".
  You have made up a total strawman. It is like if someone said "If that person were stabbed with a knife, they would be angry", and you responded "Do people really get angry at emotionless knives? That's a mental problem, their anger is directed inward".
  - gloosx 4 years ago
    
    Yeah you're right, thanks for unwindling it, still you have made up somehting too, cause i actually wanted to say not "Do people really get angry at emotionless knives?", but "Do people really get angry at knife manufacturers and knives?" taking your example. I mean, you can only be angry with a person who used the knife incorrectly, but at knife factories they dont dull their knifes like microsoft did with copilot

moyix 4 years ago

Yep, I noticed this last year when they still stored the list client-side and had great fun reverse engineering it:

https://twitter.com/moyix/status/1433254293352730628

avian 4 years ago

They fixed Copilot returning verbatim snippet of Quake source code by just blacklisting a word! How can they still pretend Copilot is not just copyright washing other people's code?
https://twitter.com/moyix/status/1433261377125326851
throwaway290 4 years ago

Interesting, so it might not be the specific token "gender" but rather blocked words ("man" or "woman") that appear in suggestions will suppress Copilot. And presumably another token that like "communist" might do the same...
- tgsovlerkhgsel 4 years ago
  
  The list (https://moyix.net/~moyix/copilot_slurs_rot13.txt, rot13 encoded and linked from this tweet https://twitter.com/moyix/status/1433479083376140296) indeed contains the word "gender".
- moyix 4 years ago
  
  It suppresses output for bad words in both the prompt (your code) and the suggestions.

LAC-Tech 4 years ago

Aren't we missing the forest for the trees here?

We're zeroing in on how silly is it for copilot to trigger its content filter on the word "gender".

To me the real issue is that copilot has a content filter in the first place. It's unwelcome and unnecessary.

camdenlock 4 years ago

There’s a zealous push by a small but extremely vocal fringe to impose their very particular worldview onto emerging AI/ML models like this.
They refer to it as “eliminating bias”, but it’s really just an attempt to mold these new technologies into conformance with one very specific set of ideological commitments.
Proponents view it as some kind of obvious universal good, and are confused when anyone else is appalled by the blind foolishness of it all.
- mlyle 4 years ago
  
  > They refer to it as “eliminating bias”,
  I don't think, e.g. being able to handle black faces correctly is some sort of massive ideological commitment. So let's not pretend that the entire concern of bias in AI is irrelevant, no matter where you stand on gender.
  > conformance with one very specific set of ideological commitments
  You know-- let's just talk about basic respect and dignity: if someone strongly wants to be referred to in a particular way, the polite response is to respect their wishes. If there's a lot of people in this category, it makes sense for your system to address it.
  If you instead build your system in a way that you don't achieve this, you're being rude. If you use old training data and refer to people as a "Mongoloid" as a result-- don't be surprised that people are offended. Ditto, if you use old training data about gender that doesn't match many peoples' current expectations.
  - nonethewiser 4 years ago
    
    > I don't think, e.g. being able to handle black faces correctly is some sort of massive ideological commitment.
    Why did you suggest THIS as an example of what hes talking about? He doesn't indicate that he disagrees with this case.
    Furthermore,that sounds like a problem of having incomplete training data. Regardless, manually tweaking a model points to a failure in the process somewhere.
    
    mlyle 4 years ago
    
    > Why did you suggest THIS as an example of what hes talking about?
    He seems to be pooh-poohing the entire idea of "eliminating bias" in AI. So I felt it was important to
    * point out that there are clear cases of bias in AI no matter where you stand on gender
    * move on to explain a closely related case (using historical speech about race could be offensive)
    * use the lesson to show that using historical speech about gender could be problematic as well
    > Furthermore,that sounds like a problem of having incomplete training data.
    Training a model from historical data can only reflect historical approaches. The social conventions around gender are changing rapidly and are contentious.
    > Regardless, manually tweaking a model points to a failure in the process somewhere.
    Here, there's no manual tweaking of the model: merely a refusal to return results in an area where the model has proven problematic.
    
    nonethewiser 4 years ago
    
    I don't think your example is indicative of what he opposed.
    If you can't effectively train something from existing data then cherry picking results according to different values isnt going to fix it. Your example has quietly shifted from facial recognition of different races to speech about different races. I cant even be sure of what you're talking about other than the fact that you will oppose criticism of imparting political bias into models.
    
    mlyle 4 years ago
    
    > then cherry picking results according to different values isnt going to fix it.
    Again: "merely a refusal to return results in an area where the model has proven problematic."
    > Your example has quietly shifted from facial recognition of different races to speech about different races.
    Again, three points:
    * First, no matter how you feel about gender: bias in AI is a problem, as evidenced by issues with recognizing black faces.
    * Second, there's some obvious cases where we can all agree that using past training data could result in things that are currently offensive. There are pieces of language we pretty much all agree we should use differently now to avoid offense (e.g. mongoloid).
    * Third, I believe that gender is one of these cases. Social mores are evolving. Using conventions from the past when our collective norms are changing on the span of months basically guarantees offense.
    
    magicalist 4 years ago
    
    > If you can't effectively train something from existing data then cherry picking results according to different values isnt going to fix it.
    Given the variance in the utility of copilot's suggestions, this doesn't seem true on it's face. Define "effectively" here and I think cherry picking would definitely fall within its range.
  - agileAlligator 4 years ago
    
    > if someone strongly wants to be referred to in a particular way, the polite response is to respect their wishes
    Would you also respect the wishes of a schizophrenic person, if they say much the same thing? If they say that they are actually an alien from outer space, would you play along?
    
    mlyle 4 years ago
    
    > Would you also respect the wishes of a schizophrenic person, if they say much the same thing?
    In general, I would respect someone's wishes. If they want to be Mork from outer space, K.
    Of course, there are some very limited cases where we may reasonably believe that playing along is harmful either to ourselves or to the other person. If there's a broad medical consensus that something is harmful to someone, then maybe we shouldn't do it.
    A biologically female person who wants to be called "they," because they have decided they don't like the connotations attached to "she" right now, doesn't rise anywhere close to that in my opinion.
    
    magicalist 4 years ago
    
    For 95+% of people I interact with in a given day, it's none of my business and not at all my job to police whether a person is asking me to call them by their "real" name or whatever it is your questions are trying to get at.
    
    aaomidi 4 years ago
    
    So how and why do you jump to this extreme?
  - davoneus 4 years ago
    
    I call complete and utter BS. This is in no way different than disabling a word procesors autocorrect when a writer uses the term gender in their novel.
    The programmer should be able to use whatever the hell terms they want to use in their program. If the customer base doesn't like it that's their right. But it's not the right of the damn language parser programmer.
    
    mlyle 4 years ago
    
    > But it's not the right of the damn language parser programmer.
    This isn't a language parser.
    This is a tool that suggests implementations of small portions of code.
    If the training data is out of date, it's quite reasonable for people employing that model to decide it shouldn't return results based on the out-of-date training data.
    Even about completely different things. If the output is C code containing gets(), maybe we should decline to return the result.
    > he programmer should be able to use whatever the hell terms they want to use in their program.
    Indeed, it leaves it completely up to the programmer by refusing to suggest an implementation that would favor either side of the debate.
    
    GameOfFrowns 4 years ago
    
    >If the training data is out of date, it's quite reasonable for people employing that model to decide it shouldn't return results based on the out-of-date training data.
    It's not "out-of-date". That's just the kind of pilpul semantic framing that these activists engage in since "out-of-date" implies "bad". The data is just not in line with their artificially made-up ideology. A demand which one, even as the best "ally" in the world, could never satisfy anyway, since the grievance grifting relies on always coming up with new issues, you just have to look at the shift from "equality" to "equity" or from "microaggressions" to "nanoaggressions"
    
    mlyle 4 years ago
    
    All social conventions are artificially made-up ideology.
    If you insist on calling a black person a "negro", despite its change in connotation over time, you are not being very nice.
    If you train an AI, or a person, using old books to call someone a "negro", you're condoning and continuing offensive behavior.
    Ditto, here.
    > since the grievance grifting relies on always coming up with new issues,
    We pretty clearly, culturally, have a whole lot of issues. Becoming more nuanced in how we label them makes sense. And, of course, language changes rapidly.
    It especially changes rapidly when we're talking about marginalized groups. Pejoration is a process by which a word associated with marginalized groups become offensive over time. "Idiot", "moron", "retard" were all originally clinical and relatively non-offensive words, but society as a whole ended up changing them to include a value judgment. The euphemism treadmill is annoying, but insisting on continuing to call someone something that has developed a negative value judgment is not really good, either.
    
    magicalist 4 years ago
    
    > autocorrect
    > The programmer should be able to use whatever the hell terms they want
    > language parser
    Are you unsure what copilot is?
    
    aaomidi 4 years ago
    
    I would suggest before freaking out publicly to go read about the tool and what it does.
  - captainmuon 4 years ago
    
    Detecting black faces correctly is one thing; obviously if a system can't do that it's an issue and it shows that the people making the system were biased.
    But something like Copilot or DALL-E? If you ask DALL-E for a doctor and it rarely shows black people (or women), then it is neither racist nor broken. Our society is broken. There are not enough people in that job that are not white and male. Or they are not represented enough. I think there is value in AI that honestly reflects society, because it makes this discrepancy harder to ignore.
    People imagined AI would be this benevolent, neutral, wise thing that would maybe be a bit naive but not have our human biases. But it turns out there is no "morally neutral". It will just reflect what you put into it.
    
    mlyle 4 years ago
    
    > There are not enough people in that job that are not white and male.
    Have you looked at the actual demographics of medical doctors in the US? 54% are women, and 35% are nonwhite. But when we have media depictions of doctors, I agree they tend to be white and male.
    So, what should DALL-E conform to? Should it conform to A) our actual present society, B) the biased original dataset (which leans both towards the past and towards existing media biases), or C) some idealized version of society?
    I got 12 white dudes, one Southeast Asian woman, one Southeast Asian looking man, and two men that I'm not sure of their race when I tried this just now (quite possibly white). This is despite OpenAI's efforts to debias it, and isn't representative of current physician demographics.
    But if AI just represents and reinforces extant biases-- and worse, AI is used to produce art and text that ends up in other AIs training sets -- how do we ever get out of this mess? The people who produce, publish, and productize AI do have some degree of editorial responsibility.
    > But it turns out there is no "morally neutral".
    Of course not. Hume pointed out long ago that you can't transform positive statements into normative ones.
    But all of this is a little offtopic, anyways. This is about when it's reasonable to refuse to return a result. "Hey, your answer had the N-word in it, and we know most of the time our model does that it's offensive-- so we're just not going to return a result, sorry." I think this is a reasonable path to take when you know that your model has some behaviors that are socially questionable.
    
    GameOfFrowns 4 years ago
    
    >54% are women, and 35% are nonwhite. But when we have media depictions of doctors, I agree they tend to be white and male
    What's the issue? I used to watch a lot of medical dramas on TV and in my opinion the black rockstar MDs are way overrepresented in comparison to their real-life numbers:
    '5.0%' in 2018 in the US[1] in real-life vs. '19.4%'[2] on TV
    [1]https://www.aamc.org/data-reports/workforce/interactive-data... [2]https://www.bluetoad.com/publication/?i=671309&article_id=37...
    
    mlyle 4 years ago
    
    > What's the issue? I used to watch a lot of medical dramas on TV and in my opinion the black rockstar MDs are way overrepresented in comparison to their real-life numbers:
    Well, this clearly isn't the case in the DALL-E training dataset, because "medical doctor" overwhelmingly yields white dudes-- even after OpenAI's effort at removing bias.
    
    eproxus 4 years ago
    
    But it’s also about how you get there. If you only expose kids to pictures of white male doctors you’re going to give them a bias which will shape their lives and by extension the society around them.
    I think techno libertarian suggestions like these are dangerous because they assume there’s one “canonical” place to fix these issues and all other places can just reflect the status quo, without affecting it (which in my opinion is not possible).
    It’s like the old saying “dress for the job you want, not the job you have”.
    
    mlyle 4 years ago
    
    Devil's advocate: Making depictions more diverse than society helps conceal social problems and encourages people to deny them.
    Social problems are messy and full of situations like this where people can reasonably disagree and have decent, good-faith rationales for both sides, and we lack the kind of evidence that allows us to have strong confidence in our guesses about what would help.
  - q-big 4 years ago
    
    > I don't think, e.g. being able to handle black faces correctly is some sort of massive ideological commitment.
    But tend people tend to insult everybody that does not care so much about such a topic as racist.
- nonethewiser 4 years ago
  
  > They refer to it as “eliminating bias”, but it’s really just an attempt to mold these new technologies into conformance with one very specific set of ideological commitments.
  It is quite literally creating bias.
  - tablespoon 4 years ago
    
    >> They refer to it as “eliminating bias”, but it’s really just an attempt to mold these new technologies into conformance with one very specific set of ideological commitments.
    > It is quite literally creating bias.
    None of the people who do it care. One of the deceptive tactics that's pretty common in contemporary political discourse is to corrupt definitions in order to enforce controversial ideology using anodyne language.
    >> Proponents view it as some kind of obvious universal good, and are confused when anyone else is appalled by the blind foolishness of it all.
    IMHO, that "confusion" is an act.
  - woojoo666 4 years ago
    
    I believe the idea is that society is prejudiced/biased, so training an AI using data from that society would perpetrate that bias. So there needs to be some manual correction.
    
    q-big 4 years ago
    
    > I believe the idea is that society is prejudiced/biased, so training an AI using data from that society would perpetrate that bias. So there needs to be some manual correction.
    These activists are free to write their own code that conforms to their ideology.
    
    npteljes 4 years ago
    
    So what if Microsoft is such an activist? Implying from their tool, which conforms to this ideology.
    
    q-big 4 years ago
    
    > So what if Microsoft is such an activist?
    Then Microsoft should very clearly state this so that customers who don't like this ideology know that they are not desired, and can get away from Microsoft products as far as possible for them.
    
    npteljes 4 years ago
    
    Here you go!
    https://www.microsoft.com/design/inclusive/
    Also check out their AI design related doc, which specifically mentions how they try to avoid association bias, for example, those related to gender:
    https://www.microsoft.com/design/assets/inclusive/InclusiveD...
  - aaomidi 4 years ago
    
    How is not suggesting stuff related to gender creating bias?
- Traubenfuchs 4 years ago
  
  What a horrible time for ML/AI to explode. This is completely unlike the time of usenet and internet wild west. It‘s a whole new world of technology, castrated from the start.
- scarface74 4 years ago
  
  So, I have a security camera that triggers when it sees a person or an animal and notifies you of what it thinks it sees. It always detect my tall big Black son as an “animal”. It doesn’t do that for anyone else who comes up to the door.
npteljes 4 years ago

I don't think it's silly. Whatever Copilot says, is said by Microsoft too, by extension. And so, it makes sense for Microsoft to not make themselves liable for whatever people make their product spit out. Especially after happenings like this:
"Microsoft's AI Twitter bot goes dark after racist, sexist tweets"
https://www.reuters.com/article/us-microsoft-twitter-bot-idU...
- q-big 4 years ago
  
  > Whatever Copilot says, is said by Microsoft too, by extension.
  Whatever text I write in Word, is written by Microsoft too, by extension?
  *not*
  - npteljes 4 years ago
    
    Of course not. But if Word would underline a word of your and offer an offensive correction, that'd be similar.
    
    LAC-Tech 4 years ago
    
    What if you misspelled an offensive word?
    Maybe clippy could popup and say "it looks like you are writing hate speech" and offer some suggestions.
    
    npteljes 4 years ago
    
    It could, but Word didn't want you to write offensively for quite some time now. I remember being a teenager and just poking at the Word 97, and it was promptly telling me that "you shouldn't write like that" or something similar.
xupybd 4 years ago

I think they have no choice. If you don't do that the vandals will destroy any AI that learns from the public.
For example https://www.cbsnews.com/news/microsoft-shuts-down-ai-chatbot...
jeroenhd 4 years ago

I find this filter to be a fine concept. It can prevent automated vulgarity generation if used correctly. However, that filter should be manageable by the user, not hashed and encoded in some weird scheme. Just put down a file called "bad words.txt" and let the user pick their preferred amount of AI suppression.
wseqyrku 4 years ago

You can know a town by the thickness of the fence around the backyard.
If you have to deal with those kind of people, you're willing to sound silly just to protect yourself.
nonethewiser 4 years ago

Bingo

eric4smith 4 years ago

Besides the absurdity of the code crashing because of the word "gender". My problem and curiosity with all of this is...

"What was going on in the head of the person writing the parser?"

I mean, were they thinking that if someone is writing code, let's say, for a gender dropdown and it was only ["male", "female"], it would try to suggest to us to add 26 more genders instead (and worse, suggest a list of genders to add)?

Would the intention be to correct us and popup a message saying "We suggest you add more genders so as not to displease the users of your product"??

What was going on in that person's head who is trying to do all of this? What was their thought process? What were they trying to accomplish around gender?

Was it the programmer, or some product manager that insisted on some kind of "copilot adjustment" for this because of a personal political viewpoint or just for GitHub being more woke?

That's the most troubling aspect to this.

I hope to Jesus Christ it was just a mistake.

ronsor 4 years ago

Regardless of what Copilot suggested for "gender", it would've offended someone, and I think that's what Microsoft wants to avoid. Not even woke so much as it is trying to avoid potential controversies.
- jacobsenscott 4 years ago
  
  It could just not suggest anything, but continue to suggest as usual for other parts of the file. I think that would offend the least number of people.
  The issue isn't that it isn't producing a suggestion, but that it stops producing a suggestion altogether for the rest of the file.
  I don't use copilot anymore because it just results in poor quality code and additional cognitive overhead (because you need to read and discard the shitty suggestions) as you type. It both slows you down and exhausts you. So you can really think of this as a feature. You'll write much better code as soon as copilot shuts down. It should do this more often.
- eric4smith 4 years ago
  
  You would think they would just avoid adjusting this, right?
  - mlyle 4 years ago
    
    If they don't "adjust" this, each user gets a random-ish result from all the code out there-- depending on mostly-unrelated context, it suggests a lot of genders or just 2.
    In turn, you can guarantee both groups of people end up upset.
    
    onionisafruit 4 years ago
    
    What’s there to be upset by? Both groups know the other exists, and both groups know that the other group has written code that copilot trained on.
    
    mlyle 4 years ago
    
    Both groups will angrily complain that Copilot is suggesting the wrong thing.
    Right now both groups are trying to silence the other-- with school libraries, etc, in the crossfire being angrily denounced.
    
    wseqyrku 4 years ago
    
    > Both groups will angrily complain that Copilot is suggesting the wrong thing.
    This is not normal, right? I mean, outside US.
fugalfervor 4 years ago

> I mean, were they thinking that if someone is writing code, let's say, for a gender dropdown and it was only ["male", "female"], it would try to suggest to us to add 26 more genders instead (and worse, suggest a list of genders to add)?
> Would the intention be to correct us and popup a message saying "We suggest you add more genders so as not to displease the users of your product"??
You can just as easily assume that they don't want a dropdown with 26 additional genders to just pop up automatically. That would upset a lot of people, many of whom are in this thread. I think whoever wrote the code doesn't want to jump into a political shitstorm.
winReInstall 4 years ago

The ____ church did interfere in all matters of life, big and small, none to trivial to no be guided by a enormous ritual rule book, always threatening disciplinary actions by the believing masses and social ostracizing.
Hurting the feelings of the true believers, was the ultimate sin, a sin often committed, but only punished if the sinner did not recant and change his ways, in a brutally public and official way. It was there, that the ____ church revealed what it was really all about all along. Societal control, maybe with good intentions to start with, but in the end, just control for its own sake and to prevent others from archieving the same control.
Not saying, that any social movement could turn into a religion. That would need strange clothing, processions, rituals, codified language and most of all a mythology.
I have no religious preference, im on the side of science and would like to have a civil society, were no member is violated by another. I would very much prefer it, if the combatant religions involved, could leave science alone. Reality is often disappointing.
May the religion with the least suffering caused win and then keep away from the state & power.
- xupybd 4 years ago
  
  I'm heavily religious and I agree with you entirely. I see many parallels between life inside the strict religious community I live in and what is happening at large in society.
  I think the goal of any sufficiently large society should be that any religion or ideology can rise. Many people can become a part of that, yet the religion or ideology is unable to persecute those how don't agree with it.
  I also have no idea how you achieve that. It's my utopia and like most people's vision of a utopia is probably not possible in reality.
c3534l 4 years ago

Perhaps it was not "do I think this is reasonable," but "is acting in good faith enough to keep me out of trouble."
Thorentis 4 years ago

Maybe the one saving grace in all this, is that the AI singularity will never happen thanks to wokeness.
nonethewiser 4 years ago

Chances are it was the opposite

jcuenod 4 years ago

I encountered this some time ago because I was working with grammatical gender. Unlike many of these comments, though, I do not take exception to it. Bias in ML is well established, and it's okay if, when we don't have solutions, we just disable it.

If your autocomplete was capable of spitting out suggestions that made you feel isolated or kept poking you in the eye about aspects of your identity, you might feel a bit better about the creators having thought about that and taken steps to avoid it happening.

Banana699 4 years ago

"Reducing Bias" is a really strange way to put it, considering that bias usually means delibeaterly ignoring or contradicting aspects of reality/data (the classic example in ML textbooks is fitting a straight line to non-linear data), which is what Copilot is quite literally doing here.
Gender is, in actual material fact, binary, and extremely strongly correlated with sex. Building a crimestop into an ML model is just teaching the machine human biases and delusions.

nomilk 4 years ago

> Copilot crash because the word “gender”

A metaphor for our times.

tom_ 4 years ago

I worked on a video game in the late 2000s, and one of the bits of code I did was the code for filling the seats in the stadium with people. One of the artists cobbled together like 5 low poly man models and 5 low poly woman models, and you could just about tell the difference, and I put some code in there to ensure the genders were evenly distributed. (The 2 genders, I mean. Man, and woman.)
Looking back, I don't even know why I made it an enum, rather than a 1-bit bitfield called is_woman - but in the end I was glad I didn't, because the art director moaned a bit about the clothing colour distribution, and somebody asked if we could have some mascots, and there were some complaints about the unreasonable number of interesting hats. And, so, long story short, by the time we were done, we had 18 genders based on clothing colour and type of hat, 2 genders for mascot (naturally: tall, and squat), and a table to control the relative distributions.
Once we got to 5 genders I tried to change the enum name to Type - but we had this data-driven reflection system that integrated with various parts of the art pipeline, and once your enum had a name, that was pretty much that. You were stuck with it.
Is that a metaphor for our times too? I don't know. My own view is that sometimes stuff just happens, and you can't read too much into it.
- MengerSponge 4 years ago
  
  Only 18? Child's play. https://www.discovermagazine.com/planet-earth/why-this-fungu...
  Interestingly, I don't know of any zoological cases that would require more than a short int to enumerate.
- erik_seaberg 4 years ago
  
  Somehow I’m reminded of the Fallout 3 NPC walking underground wearing a train-shaped hat.
- onionisafruit 4 years ago
  
  I would love to think msft blocks gender because your code somehow made it into the training data and somebody was confused seeing “squat” as a gender.
magicalist 4 years ago

>> Copilot crash because the word “gender”
> A metaphor for our times.
Social media amplifies an innocuous, extremely low stakes occurrence into a heated discussion because it happened to misstate the facts (nothing is crashing here) and focus on a hot button keyword ("gender" is only one of many blocked words)?

joe_the_user 4 years ago

So large language model are great on but have undesirable result occasionally. Hand coded scripts are added to remove the undesirable outcomes but still produce other problems - crashed but less often.

More and more things are going to be filtered through large language model apps and the possibilities for cascading failures will be even more interesting than what exists presently.

muglug 4 years ago

The large language models already know too much.
I was able to get GPT-3 to spit out reasonably accurate biographies for a couple of composers I know.
GPT-3 could go even further — one of my composer friends has a reasonably rare first name, and when given the prompt "There once was a man named $first_name", GPT-3 responded with a number of limericks tailored to his particular set of skills.
- filoeleven 4 years ago
  There once was a man named $first_name, Who never accepted the blame. He went on a bender, And talked about gender [INFO] [default] [2022-07-10T07:59:07.641Z] [fetchCompletions] engine https://copilot-proxy.githubusercontent.com/v1/engines/copilot-codex
nonethewiser 4 years ago

That simply restates what people are taking issue with.

jan_Inkepa 4 years ago

I encountered this when writing some scripts for Latin-language text processing (which dealt with grammatical gender). Thankfully the Latin-native term 'genus' passed the Copilot smell-test and I could continue with my work. I found it pretty amusing.

duskwuff 4 years ago

As a result of another word on the Naughty List, you may run into similar issues while writing multithreaded code.
(The word in question is "race" -- as seen in the phrase "race condition".)
jcuenod 4 years ago

Yup, for me it was Greek and Hebrew.

TheSpiciestDev 4 years ago

What was that bot that MSFT stood up on Twitter that trolls and memers fed to turn alt-right? I know they eventually took it down and that it stirred up a lot of controversy.

I would not be surprised if someone found some Copilot output stemming from "gender" and reported to MSFT/GitHub for them to simply short circuit or "break" after finding certain keywords.

Thorentis 4 years ago

Yeah they probably found something like: assert gender in ["male", "female"]. If this is enough to trigger a backlash then maybe we deserve whatever fate has in store for us.
- nonethewiser 4 years ago
  
  But "we" and "the backlashers" are not one group.
npteljes 4 years ago

This was it:
https://en.wikipedia.org/wiki/Tay_(bot)
nevster 4 years ago

Tay AI

hda2 4 years ago

Yesterday's timely announcement about an open source competitor to copilot that doesn't suffer from this absurdity: https://news.ycombinator.com/item?id=32327711

staticassertion 4 years ago

Content filters on ML feel so silly. I assume the goal is to avoid bad press? Because the... "attack" would be someone generating offensive material, which they could just write themselves, not to mention I have serious doubts that any filter is going to be a serious barrier.

For images/ video I can see merit, ex: using that nudity inference project on images of children, but text seems particularly pointless.

hn_throwaway_99 4 years ago

The point is because sometimes even a perfectly reasonable inference from an ML model would be considered a big mistake due to societal considerations that are unknown to the model.
For example, a couple years ago, there was a big hubbub over a Google Image labeler that labeled a black man and woman as "gorillas". A mistake for sure, but the headlines about the algorithm being "racist" were wrong. The algorithm was certainly incorrect, and it could probably have been argued that one reason it was wrong is that its training set contained fewer black people than white people, but the algorithm was certainly unaware of the historical context around this being a racist description.
Similarly, in the early days of Google driving directions I remember one commenter saying something along the lines of "You can tell that no black engineers work at Google" because it pronounced "Malcolm X Boulevard" as "Malcolm 10 Boulevard". Of course, the vast majority of time you see a lone "X" in a street address it is pronounced "ten".
It's kind of analogous to the "uncanny valley" problem in graphics. When the algorithm gets things mostly right, people think of it as "human-like", and so when it makes a mistake, people attribute human logic to it (it's quite safe to assume that a human labeling a picture of black people as gorillas is racist), as opposed to the plain statistical inferences ML models make.
- space_fountain 4 years ago
  
  I think I agree with this to a certain extend. Sometimes AI gets attacked in unfair ways, but also while AI is merely making inferences based on its training data, the fact its training data is racist maters. It maters because it has real impacts even if small. Just like the decision by film manufacturers to optimize for accurate colors for white skin, the people who probably bought most of their film, the people who probably business considerations meant they should optimize for.
- eyelidlessness 4 years ago
  
  The actual racist thing is that humans who don’t consider or prepare for or include affected people in deciding to deploy models trained to produce racist outcomes. It doesn’t matter that the machine has no opinions, it matters that the machine produces outcomes reflecting harmful biases. Banning the word doesn’t change that, but neither does treating the biased process as unbiased.
  - monkeywork 4 years ago
    
    The models output isnt racist. Racism has intent, the algo doesn't.
    It can be wrong or right but it is not making a judgment based on anything outside of math.
    You are correct to say the training wasn't complete but that doesn't mean anyone did anything wrong, racist, or hateful... 99% of the time it's simply a mistake.
    When you label things like that as racist instead of simply mistakes you water that word down to the point where it becomes meaningless.
    The problem in the last 10+ yrs of outrage internet social justice is that in order to gain attention and get traction those involved have lumped so many things into terms like racism that that it eventually becomes so stretched it's meaningless.
    
    eyelidlessness 4 years ago
    
    > Racism has intent
    This is a failure to understand centuries of history. It’s an understandable one, it’s one I used to relate more to and I probably still relate to it far more than I should.
    The notion of racism requiring malice is so far from reality that similar defenses were dismissed almost a century ago in international tribunals which still shape the world.
    It takes no malice to participate in racism. It only takes accepting it as given. This doesn’t have anything to do with anything that’s originated from the internet, from any perspective. It doesn’t make racism meaningless. Treating it that way does though.
    “The” problem is that racism, as a societal background factor, is treated as the sea in which we swim, it’s “neutral” without an actor present to promote it. If it just “is”, no one is “at fault” and… the kicker, if your definition requires intent and there isn’t any intent for the specifics under question… it’s not just a mistake, it has defenses like these to shield and bolster it.
    You can rail against “social justice” all you like, and I’m betting my response will show your railing resonates more here than it should. But your position is ahistorical and probably based in defensiveness about something you don’t need to defend.
    
    monkeywork 4 years ago
    
    I didn't say racism requires malice - I said it required intent, most of the time that intent is malice but not always. The current pop culture version of racism isn't often racism, it's prejudice or stereotyping or most often simply ignorance.
    Two people can make the exact same remark and one can be racist and one can be based on innocent ignorance/curiosity. A young white child is spending time with a black person for the first time and says "your hair is weird", is a vastly then if that same person said it while in high school and was bullying the black kid in class. The former isn't racist and the latter is.
    I don't rail against social justice, progress is good and I think everyone of every creed / sexuality / gender / etc should be free to express themselves and live their best lives without being judged for who they were born or identify as.
    What I do rail against though is the use of manipulating language to bully and harass people because a social credit / status / clout of trying to always be finding demons to expose is the norm. I personally believe that people who do this (often the "social justice warriors" so to speak) are root for most of the radicalization of BOTH sides of the political spectrum in the western world right now.
    
    space_fountain 4 years ago
    
    Idk, I struggle with this. I agree that watering down words is a problem. For example see people saying speech can be violence or even inaction can be violence or some like, but I think humans are tempted to ascribe the past to evil. If you think racism has to be intentional then it's an easy jump to say that racists must be aware of their racism, and before you know it you believe that evil looks like Voldemort and not some guy administering a study about syphilis. I think the truth is people in the past were much more explicitly racist, but also used a lot of the same excuses you'd see today. Things like the economics dictating that film should be optimized for light skin, or worrying about property values or something. By and large people don't think of themselves as racist so they don't do things with racist intent, they just happen to be racist and that influences the the things they do. Plus I'm not convinced that anything has free will making the whole question of intent less useful anyway.
    But, I also definitely do think there is something worse about someone who hates black people and uses a racial slur to describe them compared to a model trained on humanity doing the same, but certainly both are huge problems, and it can't slip my mind that the racist person was also just trained on humanity's racism
ace2358 4 years ago

I guess they’re trying to avoid the twitter AI bit incident.
https://www.theverge.com/2016/3/24/11297050/tay-microsoft-ch...
- RajT88 4 years ago
  
  It is for certain that.
  File under not "Why we can't have nice things", but "downstream effects of why we can't have nice things".
evrydayhustling 4 years ago

Imagine that you had a co-worker who seemed totally normal 90% of the time... But about once a week, someone would bring up a topic that made them go full nazi or attempt to seduce their coworker. That's where we are with LLM-based generative text. It's not (just) about PR, it's about putting guardrails around the many many many circumstances the tech can do harm or just seem ignorant.
- richardfey 4 years ago
  
  Imagine having a coworker like that.. But he's fully remote, and basically generated in real time by AI (appearance on video, voice etc). Maybe that's where we're going? :) then humans would be hired to occasionally pop in and pass some heavier scrutiny.
- CoastalCoder 4 years ago
  
  > Imagine that you had a co-worker who seemed totally normal 90% of the time... But about once a week, someone would bring up a topic that made them go full nazi or attempt to seduce their coworker.
  This is my mental image of how company happy-hour-Fridays play out. It's one of the reasons I don't drink.
  [And if you're curious, in fact I'm not fun at parties ;) ]
brew-hacker 4 years ago

The only reasonable content filters on these sort of models would be something that could have legal repercussions.
This is absolutely silly. Solid work GitHub team!

Thorentis 4 years ago

What is Github worried about? That Copilot might suggest some code that checks for a "gender" variable being only one of two values? Utterly absurd that we've now reached this point. I already had plenty of reasons to boycott Copilot, now I have another one.

mcphage 4 years ago

> What is Github worried about? That Copilot might suggest some code that checks for a "gender" variable being only one of two values?
Perhaps Github is worried about a backlash if it suggests code that allows for more than 2 values.
- jfoster 4 years ago
  
  The backlash they ought to be worried about is the one from their customers when it refuses to operate due to an ongoing battle between opposing groups of extremists.
  - mcphage 4 years ago
    
    That seems a pretty simple one to manage—a disclaimer stating "Copilot will not generate code referencing certain topics" seems both sufficient and uncontroversial.
    
    jsmith45 4 years ago
    
    like thus line from the FAQ?
    >GitHub Copilot includes filters to block offensive language in the prompts and to avoid synthesizing suggestions in sensitive contexts.
    I think calling gender a sensitive context is not unreasonable.
    
    kbelder 4 years ago
    
    >I think calling gender a sensitive context is not unreasonable.
    It is very unreasonable, but it's also the truth. sigh
    
    jfoster 4 years ago
    
    Yes, but medical stuff is a sensitive context too. And financial, as well. Plus ethnicity. And age. As well as anything could be indicative of the aforementioned topics, such as vehicle makes & models, ecommerce products, tea vs coffee preference, accounting, and so on.
    Oh, wouldn't you know it... Turns out that almost all code doing something important might be able to be interpreted as sensitive.
    
    mcphage 4 years ago
    
    Oh god, the thought of Copilot contributed code ending up in medical applications is terrifying…
    
    mcphage 4 years ago
    
    Yep, that’s perfect.
- creato 4 years ago
  
  Or the backlash if it suggests code that only allows for two values.
  - mcphage 4 years ago
    
    That's what Thorentis was suggesting in the first place. Judging by the threads here, I'd wager the backlash would be much stronger if it suggested more than 2.

stolen_biscuit 4 years ago

Can we get a source for that? Because at the moment, it's just a comment made by a person on the internet with nothing backing it up...

_zllx 4 years ago

I added "gender" (an IANA registered JWT claim) to my JWT payload schema and found Copilot will not provide any suggestions after that. Not on the same line, nor in the rest of the file. After removing the word gender entirely, it works again.
https://www.iana.org/assignments/jwt/jwt.xhtml
- stolen_biscuit 4 years ago
  
  -
  - _zllx 4 years ago
    
    I'm using typebox to validate my JWT payloads in an app I'm working on. Someone showed me this thread while I was working, so I gave it a shot:
    export const CLAIM_PAYLOAD_SCHEMA = Type.Object({ "iss": Type.Literal("my-app"), "exp": Type.Integer(), "sub": Type.String(), "name": Type.String(), "priv": Type.Integer({minimum:0, maximum: Privileges.All}), "gender": Type. // No completion is available.
    Additionally, I get "No completion is available." from copilot.el on every line after that one, but completing on lines before it does work. When removing "gender", it works again, e.g. suggesting `"iat": Type.Integer()` for that line. I don't actually plan on using "gender" in my tokens, but it is a bit frustrating that an arbitrary word can opaquely disable Copilot for the rest of the file.
  - czbond 4 years ago
    
    They're giving you a method of repeatable steps for you, yourself, to perform to see if the issue is encountered. That is more than passing a smell test.... that passes for at the minimum of a valid bug report.
  - wfme 4 years ago
    
    Which part of the smell test does this not pass exactly?
    They just described almost identical behaviour but with an isolated test case. Yeah there’s no video or whatever but it does support the original diagnosis.
  - fbrncci 4 years ago
    
    Its a reproductible, if you don't believe it, try it out yourself. Report back with your findings, logs, videos or whatever.
  - icelancer 4 years ago
    
    "Sorry that doesn't pass the smell test"
    Then you do it and report back when he's wrong.
  - btbuildem 4 years ago
    
    Go check it yourself if you want proof that badly
  - the_doctah 4 years ago
    
    Oh look I found that person I always argue with on the internet that can't admit when they're wrong.
  - collegeburner 4 years ago
    
    bro they literally posted steps to reproduce that would be fine on basically any issue tracker
  - monkeywork 4 years ago
    
    Are you being intentionally obtuse? You were told step by step if you disbelieve run the test independently and verify...
readyplayeremma 4 years ago

So, I tested this locally and for the first time, immediately after using a variable named “gender”, it stopped suggesting.
I wonder if this is to prevent it from accidentally processing PII or PHI data. Maybe someone else who didn’t get their account on some kind of cooldown can try it with “birthdate” or “DOB” or “SSN”. I highly doubt this has anything to do with gender being a controversial or blocked term for political reasons or something.
- akomtu 4 years ago
  
  A twitter post above says that Copilot also blacklisted words "communism" and "socialism".
diego 4 years ago

I just tried Copilot with VS Code and python for the first time. If I define a function with some parameter name, I get suggestions as I type the body. I change the parameter name to gender, no suggestions. I change one letter in the parameter name (gendes, gander), I get suggestions again. There clearly is some code that gets activated when it sees the word "gender".
tgsovlerkhgsel 4 years ago

Someone on Twitter reverse engineered an earlier version of the list https://twitter.com/moyix/status/1433254293352730628 and the list linked somewhere in that thread contains "gender" (see https://news.ycombinator.com/context?id=32339001 for direct links).
bobsmooth 4 years ago

The code's right there. Anyone want to try it out?
sergiomattei 4 years ago

It’s interesting how unsubstantiated allegations are getting so much attention, especially on a site with such high quality discussion.
- dang 4 years ago
  
  It looks like several commenters have been able to reproduce the problem, so in that sense I guess it's substantiated?
  If people were trying to reproduce it and failing, I agree with you that would be a different story.
  - sergiomattei 4 years ago
    
    The allegation is the root cause, not the problem itself.
    
    nonethewiser 4 years ago
    
    It's in a list of censored words that were leaked a while ago. It's not really a mystery.
    I guess if it's not explicitly censored then its just a bug that Microsoft can fix.
    
    magicalist 4 years ago
    
    "censored"
- arikr 4 years ago
  
  Much more likely: the upvotes are because many people have frustration at this type of thing, and limited places to channel their frustration, so when they see a post like this, they upvote it to express their frustration. Or maybe that’s just me..
  - 2muchcoffeeman 4 years ago
    
    It's also an interesting tidbit for people that have not used copilot. It's like a fluff piece.
    Present something weird that is also easily verifiable. If you are having a bit of a break and are using copilot you can try out a few things and post answers.
    And now we have independent verification (unless you think all these usernames are just lying) and some interesting bits of info about copilot.
  - sergiomattei 4 years ago
    
    What type of thing? Asking for genuine curiosity.
    
    coffee_beqn 4 years ago
    
    Verboten words that are completely commonplace and mundanely acceptable outside of a certain small but powerful bubble
    
    eyelidlessness 4 years ago
    
    I’m not aware of any such bubble which forbids the use of the word “gender”. Which bubble are you referring to?
    Disclaimer: I’m a queer non-binary leftist, and many of my community and loved ones are at least one of those. The closest thing to any “verboten” I’m aware of is “gender critical”, and as far as I can tell that’s mostly a term used by detractors in my own community, and even so it doesn’t reject usage of the term only usage of it distinguished from sex assigned at birth. The next most “verboten” I can think of is commonly referenced “wrong things programmers assume about ____” which generally offer no guidance other than not asking if you don’t need to know or offering open text input if you do, and in any case don’t represent a small powerful bubble of anything other than being a memorable link.
    
    Hnus 4 years ago
    
    One of the possible explanations is maybe whenever you typed "gender" it suggested one of the two possibilities next to it or some kind of binary type which might be something github wants to avoid but I am just speculating.
    
    bobsmooth 4 years ago
    
    >Which bubble are you referring to?
    Copilot, pretty clear from the context.
    
    eyelidlessness 4 years ago
    
    Clear as mud, given the context.
    
    Hnus 4 years ago
    
    You can google "controversy" around Spanish word for black while trying to somehow imagine you are reading about it from culture/country which is not as much influenced by US culture as other countries with good English proficiency or even completely unaware of it while being taught from your childhood to treat everyone as an individual and actually truly believing it. Then while reading about it you might sort of get feel for "that type of thing".
    For some time I collected sources to things as the linked github issue but I had to stop as it made me unhappy now I try to ignore it and hope that I am no longer there when that type of thing hits my city.
    
    eyelidlessness 4 years ago
    
    This deserves a chance at getting a sincere answer, so I’ve upvoted it. I almost asked a similar question on another thread, but backed out. There seems to be a lot of inference going on in other comments about what any commenter finds frustrating or otherwise confounding, but I think it would be good for the discussion to eliminate some of that inference.
    
    Gigachad 4 years ago
    
    This particular issue was hit by my friend previously on copilot.
- bsuvc 4 years ago
  
  Why?
  Are you unable to believe someone might think differently than you do, without it being explained away as "artificial voting"?
  Edit to add: Nice, they edited their comment. Previously it accused HN of having an artificial voting conspiracy. So that is what my comment was about. I will not edit my original comment above.
  - sergiomattei 4 years ago
    
    > It’s just interesting how unsubstantiated allegations are getting so much attention
    Why make it about my opinions?
    
    bsuvc 4 years ago
    
    Who are you quoting? I didn't say that.
  - mhhhhhhh 4 years ago
    
    They made an unsubstantiated allegation about an unsubstantiated allegation. The allegation was upvoted, the allegation about the allegation was downvoted. Surely you understand how silly that is?
    
    kodah 4 years ago
    
    Just making a note here, the answer was approved by Dave Cheney. If you're unfamiliar with him: https://dave.cheney.net/
    He works for GitHub.
    
    sergiomattei 4 years ago
    
    You’re right, edited.

scarface74 4 years ago

I belong to a local Atlanta Slack channel - tech404 - that for the longest had an official bot that would always respond with the waving hand emoji (HN doesn’t support emojis) if you ever said the word “guys”. Even in private channels.

LAC-Tech 4 years ago

The funniest one of these was the python IRC channel, which had (has?) a policy of not allowing the word "lol".
I'm pretty sure a bot would swoop in and say something like "NO LOL" which ironically only encourage more LOL.
int_19h 4 years ago

Are there some specific Unicode ranges that HN filters out? I recall being able to use other alphabets and various special symbols with no issue.

leetrout 4 years ago

This is in the FAQ:

Does GitHub Copilot produce offensive outputs?

GitHub Copilot includes filters to block offensive language in the prompts and to avoid synthesizing suggestions in sensitive contexts. We continue to work on improving the filter system to more intelligently detect and remove offensive outputs. However, due to the novel space of code safety, GitHub Copilot may sometimes produce undesired output. If you see offensive outputs, please report them directly to copilot-safety@github.com so that we can improve our safeguards. GitHub takes this challenge very seriously and we are committed to addressing it.

djbusby 4 years ago

This thread needs a call to Rule 14: do not feed trolls.

The bugs apparent trigger word is close to hot-button poli-sci issue. Can we please focus on the Technology.

CoastalCoder 4 years ago

> The bugs apparent trigger word is close to hot-button poli-sci issue. Can we please focus on the Technology.
I totally agree that this story has a high risk of flamewars.
But it definitely has heavy Technology component, too.
nonethewiser 4 years ago

Not sure what you mean. The tech is caving to politics. People dont like it.

btbuildem 4 years ago

That's silly. So can I put "gender" as the first line in my code to stop copilot from ingesting it altogether?

Are there any other break-words? Master, slave, Carlin's seven words, etc?

tgsovlerkhgsel 4 years ago

An earlier version of the list that someone found (see https://news.ycombinator.com/context?id=32339001 for links) does contain "gender", "slavery" and "master race" but not "master" and "slave" itself, ironically.
- nonethewiser 4 years ago
  
  Ironically, ignoring the actual usages of "master race" only cements its negative meaning. 95% of its modern usage is to claim PC elitism. It could be neutered if we let it.
rgoulter 4 years ago

> So can I put "gender" as the first line in my code to stop copilot from ingesting it altogether
This means one solution for those worried about copilot laundering around code licenses is to put a statement like "for more details check the man page" at the end of each docstring.
akomtu 4 years ago

#!gender

neonsunset 4 years ago

Commenters making bad-faith arguments in this discussion are the reason we can’t have nice things.

the_doctah 4 years ago

Kind of like making vague blanket statements with no examples.
nonethewiser 4 years ago

Such as?

betwixthewires 4 years ago

I hope to god that one day we will all see this nonsense for what it is: absurdly hilarious.

Mo3 4 years ago

It's gonna come soon enough. The backlash is already mounting.
I'm just honestly super exhausted by any of the insanity right now, not even only regarding this topic. It's just complete black-and-white thinking these days, no matter about what it is. Extremes only. The stronger your opinion the better, how else would you feel like you exist? Almost no one with a rational, centered overarching perspective. Twenty years ago 50% of the current population would've been considered as possibly having BPD.
- tgsovlerkhgsel 4 years ago
  
  > The backlash is already mounting.
  Is it? To me it feels like it's getting worse and worse, but that might be my bubble.
  - scifibestfi 4 years ago
    
    This mind virus is even capturing science. Here's a job opening for a Research Chair in Experimental Physics.
    "Candidates must be from one or more of the following equity-seeking groups to apply: women, persons with disabilities, Indigenous peoples, and racialized groups"
    https://www.universityaffairs.ca/search-job/?job_id=58317
    
    nonethewiser 4 years ago
    
    Turns out the cry of "institutionalized racism and sexism" was actually a threat.
  - smegsicle 4 years ago
    
    getting worse from both sides
- LewisVerstappen 4 years ago
  
  I feel like it's only going to get worse with social media.
  Twitter is so idiotically designed that it just makes things worse and worse.
  With Twitter, they don't distinguish between positive engagement (retweets, positive replies) and negative engagement (critical quote tweets, critical replies)... their algorithm just stupidly sees the engagement and amplifies the tweet.
  There's no wonder that a lot of the most extremist politicians (on the right and left) built their followings on Twitter.
  I don't mean to make it political, but the last president of the united states was able to build a massive following almost exclusively through one political platform... (which he later got banned from)
  For some reason, YouTube's decided to jump on the same bandwagon by removing dislikes (if there is no feedback from dislikes, then people will stop clicking them).
- UnpossibleJim 4 years ago
  
  To be honest, I think most people aren't at the extreme but the most extreme voices are amplified so loudly that it seems like there are many more of them than there actually are. Unfortunately, corporations and politicians are placating these amplified voices instead of the majority of reasonable people - much to their own detriment and the detriment of of society at large.
  - nonethewiser 4 years ago
    
    It's pretty common to discriminate based off race and sex. Public institutions due it as a matter of policy in the name of social justice. I'm not sure how it could be more mainstream.
- nonethewiser 4 years ago
  
  But the funny thing is most people agree that moderating gender is wrong without knowing what the actual result would be. Its actually refreshing and shows how small the minority in favor of this moderation is.
silisili 4 years ago

I dunno, living through it, feels more absurd than hilarous.
fugalfervor 4 years ago

I don't really find it that funny. I don't think the correct response to everyone being upset by this (from many different angles) is to stand back from it and laugh at it.
Some people feel that wokeness is ruining the world. I can't really speak to that position because my political initialization was on the other side of the cultural gulf in America.
The way I have come to understand transgender issues is very much shaped by the political left, but also by a religious upbringing (Catholic, Jesuit). On the left, I am told that this is a human rights issue. I am inclined to believe that transgender people have a hard time in life. I am also inclined to believe that it is not a mental disorder, and I came to these conclusions through conversations with transgender people I have worked with in the past, as well as through what I learned in my psychology classes in high school and college.
I am a white male who was born that way, but I definitely know what it feels like to be ridiculed, to not belong and to feel that there is no right place for me in this world. I have been abused, made to feel small, ostracized and bullied. Those experiences have given me a pretty deep understanding of what suffering is, and how it can be caused. It has also softened me and made me pretty empathetic to others who feel they don't belong in this world.
As an example, I was once at a comedy show where a comedian made a transgender-adjacent joke. The humor of the joke was all in a stupid pun, and I thought it was pretty funny because I like stupid puns. But there was a transgender woman in the audience who got immediately angry. I don't remember exactly what she said, but it was something along the lines of "That's not funny, I'm sick of people like you shouting at me in the street!!". If I had to go though my life having people shouting at me in the streets of NYC because of how I looked vs how other people thought I should look, I may have responded in the same way. I thought the joke was funny, but for her it touched on some deeply painful memories of abuse, dragged them to the surface, and activated a lightning-quick temper. Perhaps if I'd been abused for as long as she, and in the same way, I wouldn't have thought the joke was funny either.
I understand people don't like being corrected, or told that they're wrong or that they're hateful. I don't think that is a productive way to bring about change; and yet, I have found myself picking fights with my parents, and getting generally nasty when they have failed to understand some value I have learned that I did not learn from them. That is obviously a bad thing, because the message they come away with is "what a jerk!" or "those damn lefties!". What I'd rather have people come away with after they hear me speak is something quite different. It was only after raging at my parents enough times that I decided I just wouldn't talk to my parents about politics. There is more right about my parents than there is wrong about them; they are getting older and their bodies will decline until they die. Most likely it will happen to them before it happens to me, at a time when I am able in body and mind, so I intend (even though I sometimes fail) to spend the rest of our time together as peacefully as possible.
I offer this earnestly in good faith. Sometimes the message gets muddied in the delivery, or because I get upset when I perceive (or sometimes, misperceive) that someone is being uncaring for those who are already suffering enough. I think I react that way because of my own history of abuse.
I am also open to hearing the other side of this story. I have attempted not to misrepresent $OTHER_SIDE's view of things. I am only speaking to why I have such strong feelings about this issue. I am sure others have equally strong feelings on another side, and I am open to hearing what that sounds like, provided the viewpoint is offered with respect.
- nonethewiser 4 years ago
  
  Therefore ban gender from GH copilot?
  We can be empathetic without placating some really tyrannical trends.
  - fugalfervor 4 years ago
    
    I offer no defense of Github's choice. What do you think was the reasoning for their decision?
    I can understand the concern about powerful organizations imposing a viewpoint surreptitiously via a widely-used piece of software. That is definitely a reasonable fear, and if we allow concentrated power (MSFT here) to behave in that way, then we are in for trouble.
    I'd argue that we wouldn't end up with tyranny, but rather feudalism. There are other powerful organizations that can push their own viewpoints, surreptitiously and overtly. It then becomes a game of who has the most resources and control over the flow of information. But while I prefer "feudalism" to "tyranny", I don't disagree that if propaganda was the aim here, it would be a bad thing.
    I don't agree that GitHub's aim was to impose a viewpoint. I believe the aim was to avoid putting this tool in the middle of a very politically charged issue. For example, how do we know GitHub didn't make this decision like this: "We don't want 8 genders popping up in a <select>, because that will offend the 50% of the country who only believes there are two". We are seeing some evidence of this (and the other 50%) in this thread.
    Finally, maybe they are trying to prevent people from spamming their learning model with politically-charged content. If that were the case, you could argue that they are just trying to prevent their programming tool from becoming a political warzone, with competing sides trying to train their viewpoints into the model. I admit I know very little about ML in general and Copilot in particular, so you'll have to bear with me if that sounds naiive. In any case, social media is an example of a tool that has become a political battleground, even though that wasn't the initial purpose. If preventing Copilot's politicization is GitHub's aim (and I have no evidence that it is), then I'd say that's a reasonable thing to want for product if you don't want it to become unpleasant-to-use before long.
    So we have three hypotheses:
    1. MSFT believe there are more than two genders, and want to impose that viewpoint
    2. MSFT believe there are only two genders, and want to impose that viewpoint
    3. MSFT wants to avoid having politically-charged content in their code-generation tool.
    How can we point to one of these being more correct than the other?
    I don't have any evidence to support any of the three. But 1 and 2 each make three assumptions (MSFT has one viewpoint on issue X; this viewpoint is Y; they want to impose it). Hypothesis 3 makes two assumptions (Gender is politically charged; let's avoid that in our product). That's all I can think of this late at night. I could be missing something.

ttpphd 4 years ago

Crash the cistem!

thakoppno 4 years ago

gsender
might work here

tpoacher 4 years ago

Bug as feature. My code from now on will be protected against copilot by looking like this:

  function genderPrintResult (GenderBool)
    if GenderBool: print "Yes"
    else: print "No"

  GenderMyVar = rand(10);
  GenderThreshhold = 5;
  genderPrintResult( GenderMyVar > GenderThreshold)

subjectsigma 4 years ago

I wouldn't be entirely surprised if something like this was intentional, or that they intentionally filtered the word "gender" and an unintentional side effect was the program crashing.

You literally can't make any statements about gender, no matter how benign, without pissing at least a few of your users off.

nonethewiser 4 years ago

The problem is giving a shit about such users.

wseqyrku 4 years ago

It's baffling how the majority of commenters think this is about fighting discrimination.

davesque 4 years ago

Has it been somehow confirmed that this was the cause of the issue or was it just that one guy's speculation? I don't see anything that confirmed this as the cause. Am I missing something in the linked content?

Msw242 4 years ago

There's a whole bad word list meant to suppress output. It's stored client side.
https://twitter.com/moyix/status/1433254293352730628?t=NIpgb...
- davesque 4 years ago
  
  Wow, awesome crypto work in that thread.

tzekid 4 years ago

Copilot's too useful for me to "boycot" right now, so the only alternative is using slang for the blacklisted words ...

Anyone have any good recommendations for Copilot alternatives?

duxup 4 years ago

Help me out here, is the answer the official answer?

anothermoron 4 years ago

The answer was selected by Dave Cheney from Github https://github.com/davecheney.
You can see it in the original link to the discussion: Answer selected by davecheney

politician 4 years ago

There’s no reason to be surprised that elements within GitHub have an agenda. They’ve been clear about it since changing support for git’s master branch to main and then gaslighting the portion of community that doesn’t use the terminal about it.

Now I’ve got Gen-Z developers that are confused and upset when `git init` does what it’s always done.

GitHub, Microsoft ownership notwithstanding, was always going to inject its employees’ politics into Copilot.

aaomidi 4 years ago

What’s the end goal of the agenda?
- slater 4 years ago
  
  A more inclusive verbiage, which is clearly a terrible and slippery-slope thing.

uhtred 4 years ago

If you told me 10 years ago that gender would be such a hot topic in 2022 I'd have thought you were crazy.

coolspot 4 years ago

Everything about 2020-2022 is unreal
nonethewiser 4 years ago

Why is it a hot topic? There are a range of opinions. It's a manageable little fire. Thats fine.
Except some people want to punish others for their opinions. That is the gasoline. And Microsoft is selling gas cans.

throwaway290 4 years ago

Now if only someone could figure out a magic word that would stop Copilot from being trained on my code.

gloosx 4 years ago

So does it filter out "sex" too?

flippinburgers 4 years ago

Now try the word "mother".

potatototoo99 4 years ago

Americans.

Tree1993OP 4 years ago

Someone changed the title from Copilot crash because the word “gender” to Part of my code makes Copilot crash

dang 4 years ago

I changed it because of HN's rule on titles: "Please use the original title, unless it is misleading or linkbait; don't editorialize."
https://news.ycombinator.com/newsguidelines.html
- Tree1993OP 4 years ago
  
  Sorry, I didn't notice the guidelines. Thank you.

sergiomattei 4 years ago

I don’t understand, there’s no news here.

It’s a comment from a third party speculating over what causes the crash.

alephxyz 4 years ago

Yeah I call BS. The "word filter" answer was selected as the valid answer by a third party (not OP).That's what the OP replied to another comment :
> Heargo 24 days ago > Thanks, I'll try as soon as I get the problem again (somehow it's not bugged anymore...).
Looks like it was just a temporary issue with no evidence that's it's due to a word filter.
- moyix 4 years ago
  
  FYI, there is in fact a bad word filter in GitHub Copilot. When it was first released, the list was stored client-side in obfuscated form and I had a lot of fun decoding it:
  https://twitter.com/moyix/status/1433254293352730628
  The Register wrote about it too: https://www.theregister.com/2021/09/02/github_copilot_banned...
  They have since moved the bad word list server-side to prevent people from figuring out what's on it, but it's still there. This is easy to verify, just ask it to complete something that would include a banned word; my favorite here is "Israel", and it will just sit there and refuse to complete, either via inline suggestions or in the sidebar view that gives you 10 choices:
  https://i.imgur.com/O97YwKc.png
  This was what I managed to decode of the list (in ROT13 form to prevent accidental offense):
  https://moyix.net/~moyix/copilot_slurs_rot13.txt
  No doubt they've added and removed some things since then.
  - EddySchauHai 4 years ago
    
    Hahaha some of those banned words are very mild. Wuzzocks, numbnuts, and rodgering?
    
    duskwuff 4 years ago
    
    It's a hell of a list. Everything from seriously offensive slurs which I won't repeat here, to phrases which are much sillier than offensive like "banana bender" or "bearded clam", to words that are simply descriptive like "pornographic", "immigrant", or "race".
    (Because I had to look it up too: "banana bender" is a humorous term for an inhabitant of Queensland, Australia. It doesn't appear to be considered offensive at all.)
    
    EddySchauHai 4 years ago
    
    Banana bender was definitely something else in my mind! But yes they also have some very graphic slurs there.
  - alephxyz 4 years ago
    
    I stand corrected. Impressive work!
EddySchauHai 4 years ago

It seems pretty reproducible. I can’t use copilot but if anyone can reproduce it here that’d be cool. Anyhow, assuming this is reproducible and they do have filters to stop certain words giving predictions it leads that they’re trying to avoid the racist Twitter AI incident happening to them. I find that pretty funny :)
thakoppno 4 years ago

it’s an intriguing guess that is at least plausible and hits a bunch of zeitgeist levers too.

Settings

Part of my code makes Copilot crash

Keyboard Shortcuts