Settings

Theme

We hid backdoors in binaries – Opus 4.6 found 49% of them

quesma.com

4 points by stared 8 days ago · 2 comments

Reader

bell-cot 8 days ago

Kudos for the article very quickly getting to the important stuff:

> However, this approach is not ready for production. Even the best model, Claude Opus 4.6, found relatively obvious backdoors in small/mid-size binaries only 49% of the time. Worse yet, most models had a high false positive rate — flagging clean binaries.

staredOP 8 days ago

See our BinaryAudit, in which we gave AI agents access to Ghidra by NSA to find malware in raw machine code: https://quesma.com/benchmarks/binaryaudit/

All tasks are open-source & we welcome contributions: https://github.com/QuesmaOrg/BinaryAudit

Discussion on X: https://x.com/pmigdal/status/2021244382800760873

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection