Contributing.md: Add policy on LLM contributions by JosJuice · Pull Request #14445 · dolphin-emu/dolphin

I've been procrastinating this for a bit because I'm not sure how to phrase this so apologies if this seems confused or inconsistent.

I am personally morally opposed to the usage of quote-unquote "AI" in general. It has (in a shockingly short time, too) made the world significantly worse, by spreading massive amounts of misinformation, wasting a huge amount of energy, making computers (and many related things) completely unaffordable for many people, giving companies excuses to fire their workers or at least pay them less, the list goes on. It all sucks and I think one should oppose it because of those reasons -- even if it can, in theory, improve some things too. But the tradeoff is not worth it.

Maybe if the hype dies down in a few years and people actually recognize what the systems are good at and, more importantly, what they're bad at and stop trying to push it into every goddamn thing regardless of their usefulness in that space, my opinion on this will change. But right now I just hate it and everything related to it.

For what it is, I think the policy is reasonable enough, especially since banning it completely might lead to people trying to 'sneak' it into their PRs and that's worse for everyone. So if you want to allow it with the policy, fine. But I won't like it.

In regards to:

Well, I think it's strange to have a more lenient policy for LLMs that know confidential information than humans who know confidential information, but I've adjusted the text based on the feedback.

While I think equating LLMs (or similar constructs) with humans is incorrect in pretty much every way, I do think this is weird. It seems to me like it would be more trustworthy to have humans who've seen confidential information to make the decision whether code they're contributing is related to the information they've seen than trusting LLMs to not use this information in their generated text. Unless you can guarantee that the training data did not contain this confidential information, how can you ever be sure that the output won't either? It's not like the decision process of these systems is transparent, after all.

Now, yeah, I'll grant that it is probably very unlikely that if you're, I dunno, asking about something Qt related you'll get a result that contains this information. But you can't ever be sure either.