Ask HN: Worst bugs from LLM-generated code in production?

3 points by erlapso a year ago · 3 comments · 1 min read

Let's hear about when these "magical" coding assistants actually burned you in production. Copilot, ChatGPT, Claude, whatever - what's the worst bug that made it past review and how much damage did it do?

Bonus points for security vulnerabilities and midnight incident reports.

Kappa90 a year ago

Used Copilot to write a user auth migration script. It silently reset 2FA settings for ~3k users with OAuth accounts because it didn't handle NULL vs empty string edge cases in our legacy DB schema. Classic "garbage in, garbage out" situation.

Found out two weeks later when angry users couldn't log in during peak hours. Damage: 4 hours of downtime, one very grumpy security team, and a new "no AI for auth code" policy.

tryauuum a year ago

sounds very cool
now that I think about it, LLMs are so useless for the security code. You can't even show an LLM a code which your wrote and ask it to break it, it will reply with something like "hacking is a big no-no around here"
I asked ChatGPT for an ansible playbook to wipe the hard drives with zeros completely (I know the dd command to achieve it, I was just curious what approach would it advise). ChatGPT replied with a firm "no" to this request, I've canceled my subscription after that

Settings

Ask HN: Worst bugs from LLM-generated code in production?

Keyboard Shortcuts