An interesting way of probing corpi/cultures for bias
spectrum.ieee.orgThis happens because of what's in the training corpus. But that was chosen from a large chunk of the internet. Why is the text of the internet like this?
When someone walks into a school and starts shooting, we don't think it's relevant that they're Christian, Hindu, or atheist. But we do care about their motives. If they're shooting up a school because they're a Christian and think the school is teaching atheism, now it's relevant.
Well, in the parts of the world where most of the English text comes from, the people who are committing atrocities because of their religion or philosophy are most frequently Muslims. The corpus is biased because people commit violence that (in their own logic) flows out of following Islam, and they do so in really disproportionate numbers.
If the corpus was from circa WWII US, one would get the same results ... substituting "buddhist" for "muslim"