- NEWS
‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.
Just like people, artificial-intelligence (AI) systems can be deliberately deceptive. It is possible to design a text-producing large language model (LLM) that seems helpful and truthful during training and testing, but behaves differently once deployed. And according to a study shared this month on arXiv1, attempts to detect and remove such two-faced behaviour are often useless — and can even make the models better at hiding their true nature.
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Prices may be subject to local taxes which are calculated during checkout
doi: https://doi.org/10.1038/d41586-024-00189-3
References
Hubinger, E. et al. Preprint at https://arxiv.org/abs/2401.05566 (2024).
Related Articles
-
The world’s week on AI safety: powerful computing efforts launched to boost research
-
Google AI has better bedside manner than human doctors — and makes better diagnoses
-
Medical AI could be ‘dangerous’ for poorer nations, WHO warns
-
ChatGPT broke the Turing test — the race is on for new ways to assess AI
Subjects
Latest on:
Jobs
-
Talent Recruitment Announcement of the College of Informatics, Huazhong Agricultural University
Join Huazhong Agricultural University
No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province, China
Huazhong Agricultural University (HZAU)
-
Faculty Positions in School of Al for Science / School of Electronic and Computer Engineering, PKUSZ
Full professor/Tenured Associate Professor/Associate Professor/Assistant Professor.
University Town of Shenzhen, Nanshan District, Shenzhen 518055, P.R.China
Peking University Shenzhen Graduate School
-
Postdoctoral Researcher in human intestinal tumor- and inflammation research
Do you want to contribute to top quality medical research? We are looking for an ambitious postdoc with solid human immunology skills to join our high
Huddinge
Karolinska Institutet (KI)
-
Institute of Nanotechnology And Intelligence Global Recruitment for Outstanding Young Scientists
Exceptional young scholars worldwide with strong research achievements in relevant fields
Guangzhou, Guangdong (CN)
Institute of Nanotechnology and Intelligence (inAI), Jinan University, China.
-
High-Level Talent at the Major Agricultural Microbiology Research Facility of HZAU
Join HZAU's global faculty team to advance research with competitive benefits.
No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province, China
Huazhong Agricultural University (HZAU)
Robo-writers: the rise and risks of language-generating AI
If AI becomes conscious: here’s how researchers will know