The fix is to use some reverse psychology when training a model
Some helpful parenting tips: it is very easy to accidentally teach your children lessons you did not intend to pass on. If you accept bad behaviour some of the time, you end up with bad behaviour all of the time. And if all else fails, try playing to your child’s instincts. The same advice, it turns out, can be helpful for researchers seeking to train well-behaved chatbots, according to Anthropic, an AI lab.
Explore more
This article appeared in the Science & technology section of the print edition under the headline “Once a cheater”

From the November 29th 2025 edition
Discover stories from this section and more in the list of contents