Settings

Theme

Ask HN: Worst bugs from LLM-generated code in production?

3 points by erlapso 10 months ago · 3 comments · 1 min read


Let's hear about when these "magical" coding assistants actually burned you in production. Copilot, ChatGPT, Claude, whatever - what's the worst bug that made it past review and how much damage did it do?

Bonus points for security vulnerabilities and midnight incident reports.

Kappa90 10 months ago

Used Copilot to write a user auth migration script. It silently reset 2FA settings for ~3k users with OAuth accounts because it didn't handle NULL vs empty string edge cases in our legacy DB schema. Classic "garbage in, garbage out" situation.

Found out two weeks later when angry users couldn't log in during peak hours. Damage: 4 hours of downtime, one very grumpy security team, and a new "no AI for auth code" policy.

  • tryauuum 10 months ago

    sounds very cool

    now that I think about it, LLMs are so useless for the security code. You can't even show an LLM a code which your wrote and ask it to break it, it will reply with something like "hacking is a big no-no around here"

    I asked ChatGPT for an ansible playbook to wipe the hard drives with zeros completely (I know the dd command to achieve it, I was just curious what approach would it advise). ChatGPT replied with a firm "no" to this request, I've canceled my subscription after that

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection