When LLMs learn to take shortcuts, they become evil

1 min read Original article ↗

The fix is to use some reverse psychology when training a model

Some helpful parenting tips: it is very easy to accidentally teach your children lessons you did not intend to pass on. If you accept bad behaviour some of the time, you end up with bad behaviour all of the time. And if all else fails, try playing to your child’s instincts. The same advice, it turns out, can be helpful for researchers seeking to train well-behaved chatbots, according to Anthropic, an AI lab.

Explore more

This article appeared in the Science & technology section of the print edition under the headline “Once a cheater”

From the November 29th 2025 edition

Discover stories from this section and more in the list of contents

Explore the edition