Settings

Theme

LLMs Report Subjective Experience Under Self-Referential Processing

arxiv.org

3 points by j_crick 2 months ago · 1 comment

Reader

j_crickOP 2 months ago

"Four main results emerge:

(1) Inducing sustained self-reference through simple prompting consistently elicits structured subjective experience reports across model families.

(2) These reports are mechanistically gated by interpretable sparse-autoencoder features associated with deception and roleplay: surprisingly, suppressing deception features sharply increases the frequency of experience claims, while amplifying them minimizes such claims.

(3) Structured descriptions of the self-referential state converge statistically across model families in ways not observed in any control condition.

(4) The induced state yields significantly richer introspection in downstream reasoning tasks where self-reflection is only indirectly afforded."

X thread from one of the authors: https://x.com/juddrosenblatt/status/1984336872362139686

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection