Two-faced AI language models learn to hide deception

NEWS
23 January 2024

‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Matthew Hutson

Matthew Hutson

View author publications

Search author on: PubMed Google Scholar

Access through your institution

Buy or subscribe

Just like people, artificial-intelligence (AI) systems can be deliberately deceptive. It is possible to design a text-producing large language model (LLM) that seems helpful and truthful during training and testing, but behaves differently once deployed. And according to a study shared this month on arXiv¹, attempts to detect and remove such two-faced behaviour are often useless — and can even make the models better at hiding their true nature.

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$32.99 / 30 days

cancel any time

Learn more

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Learn more

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Learn more

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-024-00189-3

References

Hubinger, E. et al. Preprint at https://arxiv.org/abs/2401.05566 (2024).

Download references

Reprints and permissions

Subjects

Latest on:

Jobs

Talent Recruitment Announcement of the College of Informatics, Huazhong Agricultural University

Join Huazhong Agricultural University

No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province, China

Huazhong Agricultural University (HZAU)
Faculty Positions in School of Al for Science / School of Electronic and Computer Engineering, PKUSZ

Full professor/Tenured Associate Professor/Associate Professor/Assistant Professor.

University Town of Shenzhen, Nanshan District, Shenzhen 518055, P.R.China

Peking University Shenzhen Graduate School
Postdoctoral Researcher in human intestinal tumor- and inflammation research

Do you want to contribute to top quality medical research? We are looking for an ambitious postdoc with solid human immunology skills to join our high

Huddinge

Karolinska Institutet (KI)
Institute of Nanotechnology And Intelligence Global Recruitment for Outstanding Young Scientists

Exceptional young scholars worldwide with strong research achievements in relevant fields

Guangzhou, Guangdong (CN)

Institute of Nanotechnology and Intelligence (inAI), Jinan University, China.
High-Level Talent at the Major Agricultural Microbiology Research Facility of HZAU

Join HZAU's global faculty team to advance research with competitive benefits.

No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province, China

Huazhong Agricultural University (HZAU)

Access options

References

Related Articles

Subjects

Latest on:

Jobs

Talent Recruitment Announcement of the College of Informatics, Huazhong Agricultural University

Faculty Positions in School of Al for Science / School of Electronic and Computer Engineering, PKUSZ

Postdoctoral Researcher in human intestinal tumor- and inflammation research

Institute of Nanotechnology And Intelligence Global Recruitment for Outstanding Young Scientists

High-Level Talent at the Major Agricultural Microbiology Research Facility of HZAU