I have released the SRT-Adapter on Hugging Face. This is the extent of my exuberance after months of staring at training logs line by line. The results are actually pretty exciting for someone that is perfectly entertained watching paint dry.
Large language models trained on web-scale corpora absorb the semiotic bifurcations embedded in their data. These are divergent interpretant chains in which the same sign carries incompatible meanings across discourse communities. Such models have no mechanism to detect, represent, or respond to this divergence.
The SRT-Adapter is a lightweight architecture (~14.5M trainable parameters, 0.19% of a 7B backbone) that bolts semiotic awareness onto any frozen causal language model without modifying its embeddings, attention, or output head. It operates through four modules that tap hidden states at selected backbone layers: (1) a Community Discovery Head that performs unsupervised soft clustering of discourse communities from early-layer representations; (2) Metapragmatic Attention Heads (MAH) that compute divergence vectors quantifying where meaning forks under community-conditioned interpretation; (3) a Reflexive Recurrent Module (RRM) that tracks accumulated semiotic divergence through a per-position GRU meta-state; and (4) a Bifurcation Estimation Network (BEN) that estimates a continuous reflexivity coefficient \hat{r} and a binary semiotic regime (subcritical/supercritical) at each token position.
Grounded in Peircean semiotics and the pitchfork bifurcation model, the architecture treats the frozen backbone as a substrate on which semiotic processes are an emergent, measurable phenomenon. By providing structured per-token readouts directly from the hidden states of a frozen Qwen 2.5-7B model, the adapter renders those otherwise opaque representations transparent and auditable: the community vector, divergence vectors, reflexivity coefficient, and regime label function as explicit diagnostic channels that expose the latent semiotic geometry the model has already internalized from training data. This makes the interpretive dynamics visible and inspectable without altering the core model or degrading its language-modeling quality.
On held-out validation data from the Reddit Discourse Corpus with a frozen Qwen 2.5-7B backbone, the v8a checkpoint surprisingly improves cross-entropy from 2.71 to 2.63 nats while delivering Reddit community recall@1 of 0.484 (16.7× random chance on a 35-class task) and regime detection AUROC of 0.99 (ECE = 9 × 10^{-4} on 351K tokens). The v8a ablation (removal of the prototype mixing layer) leaves cross-entropy essentially unchanged while substantially improving community geometry, archetype retrieval, and trajectory anisotropy.
Link: https://huggingface.co/RiverRider/srt-adapter-v8a
The adapter functions as a diagnostic instrument rather than a generator. It provides per-token readouts suited to contestedness mapping, latent community discovery without labels, counterfactual exploration of meaning forks, and extraction of epistemic signals in contested discourse.
It remains a research tool rather than a production safety system. It inherits Reddit-centric biases from training, the reflexivity score partly reflects information density alongside contestedness, and the injection pathway has not yet produced measurable downstream effects on generation. Results should be read as descriptive mappings of current discourse geometry rather than authoritative normative judgments.
Full paper available in the HF archive. It’s super boring. Enjoy.
