Teaching AI to Understand What Words Actually Mean

It occurred to me recently that there is no good reason not to publish research papers here on Substack. Academic publishing has its own semiotic ecology, one that, ironically, often prevents the people who might most benefit from engaging with an idea from ever encountering it. So I’m going to start doing something different: publishing my research papers here, alongside a short, more human introduction that tries to explain what the paper is about and why it matters, before dropping you into the deep end.

Here’s the short version of what follows.

Large language models (ChatGPT, Claude, Gemini, the whole constellation) are trained to predict the next word. That’s it. They are extraordinarily good at this, and the result is text that looks like it understands things. But looking like understanding and actually understanding are not the same thing. The gap between them is where the trouble lives.

The specific trouble this paper addresses is polarization. Not polarization as a vague social malaise, but polarization as a precise, mathematically describable process: a bifurcation. When the same word (“freedom,” “justice,” “safety”) means fundamentally different things to different communities, and when each community’s interpretation becomes the starting point for the next round of interpretation, you get runaway divergence. The communities aren’t just disagreeing; they’re losing the shared semiotic ground that would make disagreement productive. And language models, trained on the internet, absorb this fractured meaning landscape wholesale. They don’t just reflect polarization; they reproduce it at scale.

The paper below proposes an architecture, the Semiotic-Reflexive Transformer, designed to do something about this. It gives the model explicit representations of how meaning works: not just what words co-occur, but how signs relate to objects through culturally conditioned interpretation, how those interpretations compound into mutually unintelligible chains, and where the stable ground is. That last part is my favorite. It turns out that not all meaning is arbitrary convention. The bouba/kiki effect (the near-universal tendency to associate round sounds with round shapes and sharp sounds with angular shapes) shows up in human infants, across every culture tested, and even in baby chicks one day out of the egg. Chicks. With beaks. No language, no lips, no cultural learning. The mapping is prenatal, wired in by the acoustic physics of eggs and wombs, and it’s been conserved across 310 million years of evolution. These iconic correspondences give us something to anchor meaning to, attractors that hold steady even when conventional meaning is tearing itself apart.

That’s the intuition. The paper that follows is the formal argument, the architecture, the training pipeline, and the evaluation framework. It is long. It is technical. But now you know what it’s trying to do.

The following essay was engineered with Claude Opus 4.6 using Visual Studio Code. Do excuse the PNG replacement for lack of LaTeX support. Substack does not support LaTeX or images inline with text (so fill in the blanks with variables or equations below the paragraphs until I can patch this.

Sublius

March 2026

Large language models trained on web-scale corpora inherit the semiotic bifurcations already embedded in their data. Divergent interpretant chains – sequences in which one community’s reading of a sign becomes the premise for the next, compounding away from rival readings – are absorbed wholesale, along with the ideologically enregistered meanings of contested keywords and the attractor structures sculpted by algorithmic curation. The result is not incidental bias but the structural reproduction, at industrial scale, of the very polarization dynamics that fracture shared meaning in digital societies. Current alignment methods (RLHF, DPO, Constitutional AI) intervene downstream: they constrain outputs after the model has already internalized a bifurcated semiotic landscape, producing superficial compliance without representational change. This paper proposes a fundamentally different training paradigm – semiotic-reflexive language modeling – that equips models to represent, recognize, and modulate the gap between sign and referent rather than silently replicate it.

The theoretical foundation integrates four converging lines of work. First, Peircean semiotics, as formalized by Kockelman (2025), establishes that every sign completes its meaning only through a culturally conditioned interpretant, which itself becomes the next sign in an open chain – the mechanism by which modest indexing differences compound into mutual unintelligibility across as few as three to five chain links. Second, nonlinear dynamics: Lancaster (2025) demonstrates that this compounding exhibits the structure of a supercritical pitchfork bifurcation (), where encodes the strength of algorithmic amplification; below a critical threshold shared interpretive equilibria absorb perturbation, but above it symmetry breaks into antagonistic attractors that are self-reinforcing and structurally resistant to evidence-based reconciliation. Third, Silverstein’s (1993; 2003) orders of indexicality supply the critical distinction between first-order sign use, second-order ideological construal, and third-order metapragmatic awareness – the reflexive capacity to observe how discourse itself shapes interpretation – a capacity entirely absent from current architectures. Fourth, research on cross-modal grounding demonstrates that not all semiotic mapping is arbitrary: the bouba/kiki effect (Köhler, 1929; Ramachandran & Hubbard, 2001) reveals iconic correspondences between auditory and visual processing that are robust across languages, present in prelinguistic infants (Ozturk et al., 2013), demonstrated in domestic chicks within one day of hatching (Versace et al., 2023) – organisms with no language, no vocal tract, and no capacity for linguistic convention – and recoverable in multimodal architectures such as CLIP (Radford et al., 2021). The cross-species evidence falsifies the articulatory hypothesis (Ramachandran & Hubbard, 2001) and establishes that these correspondences originate in prenatal sensory experience: the acoustic filtering properties of egg and uterus create an environment in which smooth, low-frequency waveforms correlate with biological safety and sharp, high-frequency transients correlate with threat, calibrating embryonic auditory systems for cross-modal mapping before the organism encounters the external world (Lancaster, 2026b). In Peircean terms, these correspondences constitute hypoicons (CP 2.276) – signs that represent their objects through shared quality rather than convention – whose prenatal origin establishes iconic semiosis as a conserved feature of vertebrate neurodevelopment separated by 310 million years of independent evolution. These correspondences provide embodied attractors – low-dimensional fixed points in sensorimotor space – that constrain interpretive drift toward stable, shared semantics where purely conventional signs cannot.

We synthesize these foundations into a concrete architecture, the Semiotic-Reflexive Transformer (SRT), with four structural departures from the standard pipeline. (1) A Semiotic Embedding Layer decomposes each token into representamen, object, interpretant, and attractor components, with a dedicated iconic grounding subspace initialized from cross-modal correspondence data that serves as an embodied anchor against semiotic drift. (2) Metapragmatic Attention Heads compute attention over the interpretant component and produce per-position divergence signals quantifying how far the current interpretive trajectory has drifted from neighboring chains – the architectural analog of tracking Silverstinian indexical orders in real time. (3) A Reflexive Recurrent Module (GRU) processes these divergence signals to maintain a running meta-observation of the model’s own interpretive dynamics, injecting residual corrections into the transformer stack at regular intervals – instantiating third-order metapragmatic awareness as a differentiable computation. (4) A Bifurcation Estimation Network (MLP) estimates the effective amplification parameter from the meta-observation state and produces logit-level modulation vectors that bias generation toward synthesis across attractor basins rather than reinforcement within them.

The training pipeline combines standard autoregressive language modeling loss with three auxiliary objectives: interpretant chain reconstruction (training the model to predict the next interpretant given a sign and community context), iconic grounding alignment (anchoring cross-modal embeddings to experimentally validated correspondences), and attractor landscape prediction (estimating basin membership and local -values from annotated semiotic metadata). Fine-tuning proceeds across four tasks: chain prediction, cross-attractor bridge generation, reflexive commentary production, and bifurcation simulation. Inference-time modulation provides controllable semiotic sensitivity via a continuous parameter ranging from standard generation () to full reflexive mode ().

The architecture is evaluated on three axes with explicit falsification criteria: bridging coherence (the capacity to render contested signs intelligible across interpretive communities, assessed by cross-community panels), reflexivity fidelity (accuracy of metapragmatic commentary as judged by expert semioticians, targeting F1 > 0.7), and bifurcation prediction (precision in estimating and predicting regime transitions, targeting on synthetic data and accuracy > 0.7 on historical cases). Five falsification conditions are specified: if any evaluation axis fails to exceed its baseline, the corresponding architectural component is not contributing meaningful capability. This paradigm reframes LLM alignment from reactive patching of downstream harms to proactive stewardship of meaning ecologies – producing language models that function not as amplifiers of the pitchfork but as instruments for raising its threshold.

Keywords: semiotic-reflexive training, metapragmatic awareness, interpretant chains, pitchfork bifurcation, attractor dynamics, embodied grounding, bouba/kiki effect, iconic grounding, Peircean semiotics, hypoicon, prenatal semiotic grounding, cross-species cognition, LLM alignment, polarization, cross-modal correspondence

Large language models have become infrastructural to how meaning circulates in digital societies. They draft professional communications, summarize legal and medical documents, generate editorial and creative content, moderate platforms serving billions of users, and increasingly mediate the encounters through which individuals form beliefs, affiliations, and political commitments. By 2026, LLM-generated or LLM-mediated text constitutes a significant and growing fraction of the text encountered by internet users daily. These systems are no longer tools applied to language; they are participants in the semiotic ecology – agents whose outputs enter interpretant chains alongside human-authored signs, shaping subsequent interpretation in ways that neither users nor developers can fully trace.

Yet the training paradigm that produces these systems is semiotically naive. It treats language as a sequence prediction problem, optimizing for the conditional probability of the next token given preceding tokens. This objective captures the statistical regularities of surface forms – co-occurrence patterns, syntactic templates, topical associations – while remaining structurally blind to the interpretive processes that make those forms meaningful. The model learns that “freedom” co-occurs with certain words in certain contexts, but it does not represent the fact that “freedom” indexes opposed characterological figures in libertarian and progressive discourse (Agha, 2003), that its enregisterment has diverged so sharply that the same utterance triggers solidarity in one community and threat perception in another, or that these divergent interpretants compound through chains of subsequent interpretation into mutually unintelligible worldviews.

This blindness has measurable consequences. Language models trained on web-crawled corpora do not merely learn language; they learn the semiotic ecology of the internet circa their training cutoff – an ecology already bifurcated by algorithmic curation into antagonistic interpretive communities (Lancaster, 2025). The models absorb divergent enregisterments of contested signs, incompatible semiotic ideologies governing what counts as evidence or expertise, and the attractor structures that lock interpretant chains into self-reinforcing basins. When deployed, they reproduce and amplify these dynamics: generating text that is fluent within particular attractor basins while lacking the reflexive capacity to recognize, much less navigate, the gaps between them. A model asked to explain “defund the police” will produce a coherent account from within one basin or the other – or an artificially balanced synthesis that satisfies neither – because it has no representation of the divergent interpretant chains that make the phrase a site of semiotic bifurcation in the first place.

The standard response to this problem – alignment through reinforcement learning from human feedback (RLHF; Ouyang et al., 2022) or its variants (DPO, Constitutional AI) – operates downstream of the generative mechanism. It attempts to reshape the output distribution after the model has already internalized a bifurcated semiotic landscape. The result is superficial compliance: models learn to avoid outputs that trigger negative feedback from a particular evaluator population without developing any capacity for the metapragmatic awareness that would enable genuine navigation of interpretive divergence. The aligned model does not understand why certain outputs are problematic – it has learned a reward surface, not a semiotic landscape. In dynamical systems terms, RLHF adjusts the trajectory within a fixed attractor landscape; it does not reshape the landscape itself. The control parameter that governs bifurcation remains untouched.

Recent theoretical advances create the conditions for a more fundamental intervention. Four lines of work, developed independently across semiotics, cognitive science, nonlinear dynamics, and multimodal machine learning, converge to make a semiotic-reflexive training paradigm both theoretically motivated and technically feasible.

Peircean semiotics and semiotic agency. Kockelman’s (2025) formalization of agent-world semiotic dynamics provides a mathematical vocabulary for modeling how signs, objects, and interpretants interact through chains that compound across communities. Building on Peirce’s triadic framework and its elaboration in linguistic anthropology (Silverstein, 2003; Agha, 2003; Irvine & Gal, 2000), Kockelman reconceives semiotic processes as dynamical trajectories through a state space defined by the sign-object-interpretant triad. Crucially, these trajectories are formally tractable: they can be parameterized, simulated, and – this paper argues – learned by neural architectures designed to represent them. His analysis of “sieving” (the filtering of potential interpretants by algorithms and institutions) provides direct theoretical purchase on how recommendation systems participate in semiosis without conscious intent, redistributing semiotic agency toward engagement-maximizing expressions.

Nonlinear dynamics of polarization. Lancaster (2025) demonstrates, through a 115-page synthesis of Peircean semiotics, linguistic anthropology, and dynamical systems theory, that political polarization in algorithmically curated societies exhibits the structure of a supercritical pitchfork bifurcation. The normal form is not a loose analogy but a formal model whose qualitative predictions – threshold character, symmetry breaking, hysteresis, self-reinforcement within basins – align with documented patterns of attitudinal divergence coinciding with the rise of personalized feeds in the late 2000s and 2010s (Bail et al., 2018; Leonard et al., 2021). The control parameter maps directly onto the strength of algorithmic amplification: the degree to which platforms surface and reinforce content aligned with emerging interpretive paths. This formalization transforms the paper’s central question from a vague concern about “bias” into a precise dynamical problem: how can training architectures be designed so that the model’s own generative activity lowers the effective in the semiotic ecology it participates in, rather than raising it?

The critique of probabilistic cognition. Mangalam (2025) challenges the Bayesian brain hypothesis that has underwritten both computational cognitive science and the statistical learning paradigm in machine learning for two decades. Drawing on evidence from motor control, perception, and neural dynamics, Mangalam argues that cognition is not fundamentally a process of prior-likelihood-posterior updating but one of chaotic self-organization, where meaning emerges from the interaction of nonlinear processes operating at multiple temporal scales. The implication for LLM design is direct: models optimized for next-token prediction are optimizing a Bayesian objective (approximate the posterior distribution of tokens given context) that may be an inadequate proxy for the processes that generate meaning in biological systems. McLeod (2025) extends this critique from a linguistic-philosophical direction through the conduit metaphor (Reddy, 1979) – the deeply entrenched assumption that language transmits pre-formed meanings from sender to receiver. LLMs exploit this assumption: they produce outputs that conform to the conduit illusion (fluent, contextually appropriate token sequences), leading users to attribute understanding where only statistical pattern completion obtains. The treachery lies in the mismatch between the appearance of meaning (the conduit form) and the absence of the interpretive grounding that would make it real.

Cross-modal iconic grounding: the bouba/kiki anchor. Saussure’s principle of arbitrariness – that no natural connection links signifier to signified – is foundational to modern linguistics. But it is also, taken without qualification, false. The bouba/kiki effect (Köhler, 1929; Ramachandran & Hubbard, 2001) demonstrates systematic, cross-culturally robust correspondences between auditory and visual features: rounded vowels and sonorant consonants map to rounded shapes; plosive consonants and front vowels map to angular shapes. This is not a quirk of English phonology: it has been documented in the Himba of Namibia (Bremner et al., 2013), in Tamil speakers with a non-Latin script (Ramachandran & Hubbard, 2001), and – decisively – in prelinguistic infants as young as four months (Ozturk et al., 2013), ruling out learned convention as the sole explanation.

The cross-species evidence is more decisive still. Versace et al. (2023) demonstrated bouba/kiki-like cross-modal correspondences in domestic chicks (Gallus gallus domesticus) within one day of hatching. Chicks have no language, no vocal tract adapted for speech, and no capacity for linguistic convention. They have beaks, not lips. The articulatory hypothesis – which ties the bouba/kiki effect to correspondences between mouth shape during vowel production and visual form (Ramachandran & Hubbard, 2001) – is inapplicable to an organism without the relevant articulatory apparatus. Yet the mapping appears. The last common ancestor of birds and mammals lived approximately 310 million years ago. Any mechanism shared between chicks and humans is not a recent mammalian innovation but a deeply conserved feature of vertebrate neurobiology.

Lancaster (2026b) argues that the mechanism is prenatal: both chick embryos and human fetuses develop auditory sensitivity during gestation in environments that function as low-pass acoustic filters. The eggshell and the uterus attenuate high-frequency sounds while transmitting low-frequency vibrations, creating an acoustic environment dominated by smooth, periodic waveforms (heartbeat, blood flow, respiration) in which sharp, high-frequency transients are anomalous deviations. This prenatal calibration establishes cross-modal associations – smooth sounds with smooth shapes, sharp sounds with angular shapes – through the basic multisensory integration architecture of the vertebrate midbrain (superior colliculus/optic tectum) before the organism encounters the external world. In Peircean terms, these correspondences are hypoicons (CP 2.276): signs that represent their objects through shared quality rather than convention. Their prenatal origin establishes that iconic semiosis is biologically prior to symbolic convention.

For this paper’s purposes, the critical observation is that multimodal AI architectures already recover these correspondences. CLIP (Radford et al., 2021), trained on 400 million image-text pairs, learns joint representations in which bouba/kiki-like structure emerges from the statistical regularities of paired data. This convergence between biological development and statistical learning suggests that the bouba/kiki structure reflects genuine cross-modal regularities recoverable by any system with access to sufficient multimodal data. We propose to use these correspondences as attractor anchors – low-dimensional fixed points in the semiotic embedding space that resist the drift afflicting purely conventional representations. Where arbitrary symbols require community consensus for stability (and thus fracture under bifurcation), iconic anchors derive their stability from sensorimotor invariants that are prenatal, cross-species, and pre-linguistic – shared across vertebrate populations and recoverable from multimodal data distributions. They provide the ground floor that prevents the semiotic building from floating free.

This paper makes three contributions:

Theoretical integration. We synthesize Peircean semiotics (Peirce, 1931-1958; Kockelman, 2024, 2025), linguistic anthropology (Silverstein, 1993, 2003; Agha, 2003; Irvine & Gal, 2000), nonlinear dynamics (Lancaster, 2025; Schweighofer et al., 2020), anti-probabilist cognitive science (Mangalam, 2025), and cross-modal grounding research (Ramachandran & Hubbard, 2001; Radford et al., 2021) into a unified framework for understanding why current LLM training produces semiotically treacherous outputs. The framework identifies the generative mechanism – the unchecked inheritance and amplification of bifurcated interpretant chains from training data – and specifies the conditions under which architectural intervention can address it.
Architectural specification. We propose a concrete model architecture – the Semiotic-Reflexive Transformer (SRT) – comprising four novel components (Semiotic Embedding Layer, Metapragmatic Attention Heads, Reflexive Recurrent Module, Bifurcation Estimation Network) integrated with a standard transformer backbone. Each component is formally specified with dimensionalities, loss functions, and integration points. The architecture includes a multi-objective training pipeline covering four pre-training losses and four fine-tuning tasks, as well as inference-time modulation mechanisms with three operational modes.
Evaluation framework. We define three novel evaluation axes – bridging coherence, reflexivity fidelity, and bifurcation prediction – with corresponding metrics, human evaluation protocols, quantitative targets, benchmark datasets (the Semiotic Evaluation Corpus), and five explicit falsification criteria. The framework is designed so that negative results are informative: each falsification condition identifies which theoretical claim or architectural component has failed, enabling principled revision rather than wholesale abandonment.

Scope and limitations. The paper does not claim to solve alignment, eliminate polarization, or produce artificial general intelligence. It does not present trained model weights or experimental results; it presents a theoretical framework and architectural specification at the level of detail required for implementation and empirical testing. Its scope is deliberately circumscribed: to demonstrate that the semiotic structure of language – the interpretant chains, indexical orders, attractor dynamics, and iconic grounding regularities that constitute meaning – can be made an explicit object of model training, with measurable consequences for the quality and social impact of generated text. The gap between specification and validation is acknowledged as a limitation and the primary direction for future work.

The paper proceeds as follows. Section 2 develops the four theoretical pillars – Peircean semiotics, nonlinear bifurcation dynamics, anti-Bayesian cognitive science, and embodied cross-modal grounding – showing how each addresses a specific deficiency of standard LLM training. Within the cross-modal grounding pillar, dedicated subsections present the cross-species evidence that falsifies the articulatory hypothesis (Section 2.4.3) and the prenatal mechanism by which embryonic acoustic environments calibrate iconic correspondences prior to external sensory experience (Section 2.4.4). Section 3 reviews related work in LLM alignment, debiasing, computational polarization, multimodal learning, and computational semiotics, positioning the SRT relative to existing approaches. Section 4 specifies the Semiotic-Reflexive Transformer architecture in full formal detail, with motivation, equations, and integration points for each component. Section 5 details the training pipeline: the Semiotic Annotation Schema, corpus composition, pre-training objectives with loss functions, fine-tuning tasks, optimization strategy, and inference-time modulation modes. Section 6 presents the evaluation framework with three axes, the Semiotic Evaluation Corpus benchmark suite, quantitative targets, and falsification criteria. Section 7 discusses theoretical implications, limitations, and five future research directions. Section 8 concludes.

This section develops the four theoretical pillars that motivate and constrain the proposed architecture. Each pillar addresses a specific deficiency of standard LLM training; together, they define the design space for semiotic-reflexive modeling. We proceed from the most general theoretical framework (Peircean semiotics) through its formal dynamical instantiation (bifurcation theory), the cognitive-scientific critique that motivates departing from the Bayesian paradigm (chaotic self-organization), and the embodied grounding mechanism that anchors the resulting system against unconstrained drift (cross-modal iconic correspondence).

Charles Sanders Peirce (1839-1914) developed across thousands of pages of published and unpublished work (compiled in the Collected Papers, 1931-1958, and more completely in the Writings of Charles S. Peirce, 1982-present) a comprehensive theory of signs he termed “semeiotic.” The framework’s central innovation is its irreducibly triadic structure: every sign process involves a representamen (the perceptible sign vehicle), an object (that which the sign represents), and an interpretant (the effect the sign produces, which is itself a sign). The interpretant is the decisive element. Unlike Saussure’s (1916) dyadic model, which treats meaning as a closed relation between signifier and signified within an abstract system (langue), Peirce’s inclusion of the interpretant makes signification an open, processual, and inherently social phenomenon: each interpretant can function as a new representamen, generating further interpretants in chains of “unlimited semiosis” (Peirce, CP 2.303) that never arrive at a final, unmediated meaning.

Peirce further distinguished the immediate object (the object as represented within the sign itself) from the dynamic object (the object as it actually is, independent of any particular representation). This distinction proves essential for understanding AI-mediated meaning: LLMs generate signs whose immediate objects are coherent (the text presents a consistent referential world) but whose dynamic objects may be nonexistent (hallucinated entities), misrepresented (biased framings), or systematically underdetermined (contested political referents). The gap between immediate and dynamic object is the site of semiotic treachery.

Peirce also developed an elaborate taxonomy of sign types based on the representamen-object relationship. Three are foundational:

Icons signify through resemblance: photographs, diagrams, onomatopoeia. Icons are vulnerable to selective resemblance – they highlight certain features while omitting others, enabling divergent interpretations of what resemblance entails. A photograph of a protest iconically represents the event but selects framing, angle, and moment in ways that can activate opposed interpretants.
Indices signify through existential or causal connection: smoke indicating fire, a fever indicating infection, a speaker’s accent indicating regional origin. Indices are vulnerable to contested causation – disagreement about what connections obtain and what they imply. The same correlation between a policy and an outcome may index effectiveness to one community and corruption to another.
Symbols signify through convention: most words, mathematical notation, flags. Symbols are vulnerable to conventional drift – the associations constituting them can diverge across communities and shift over time, precisely because nothing natural tethers them to their objects.

Political communication characteristically combines all three types simultaneously. The phrase “defund the police” is symbolic (its meaning is conventionally established), indexical (its use indexes political alignment and community membership), and iconic (its syntactic structure – imperative verb + object – iconically suggests a concrete action that may not match the policy referent). Each dimension presents distinct opportunities for interpretive divergence, and each requires distinct representational machinery in a model that aspires to navigate them.

For computational modeling, the triadic structure has a crucial implication: meaning is not a property of tokens but a relation among tokens, referents, and interpretive effects that unfolds across chains of processing. A standard LLM embedding – a vector optimized for next-token prediction – conflates all three Peircean dimensions into a single point in distributional space. It captures the representamen (what tokens tend to co-occur), encodes aspects of the immediate object (topical associations), and implicitly reflects aggregate interpretant patterns (the statistical shadow of how communities have responded to the sign). But it cannot disentangle these dimensions, contrast the interpretants produced in different communities, or represent the gap between immediate and dynamic object. The Semiotic Embedding Layer (Section 4.2) is designed to perform precisely this disentanglement.

Kockelman (2025) formalizes interpretant chains as dynamical trajectories through a state space defined by the sign-object-interpretant triad. The key insight, building on Kockelman’s earlier work in Last Words (2024), is that each link in an interpretant chain involves an act of sieving: from the space of possible interpretants a sign could produce, only some are actualized, depending on the interpreter’s prior sign exposure, community membership, semiotic ideologies, and the mediation architecture that delivered the sign. The actualized interpretant becomes the next representamen, entering the sieve of the next interpreter. When the same representamen enters different interpretive communities – communities whose sieving mechanisms have been calibrated by exposure to different algorithmically curated sign environments – it generates different initial interpretants. These interpretants function as new representamena generating further divergent interpretants.

The result is exponential compounding of interpretive distance through successive chain links. Consider a concrete example. The representamen “critical race theory” enters Community A, where it generates an interpretant linking to academic legal scholarship (Bell, 1980; Crenshaw, 1989). That interpretant becomes a representamen activating further associations: structural analysis, institutional reform, historical reckoning. It enters Community B, where the initial interpretant links to K-12 curriculum, indoctrination, anti-white racism. That interpretant generates further associations: parental rights, cultural war, censorship of patriotic narrative. By the third link, the two chains inhabit different conceptual universes, and the original representamen – three shared words – indexes not merely different opinions but different ontologies. The sign has become what Lancaster (2025) calls a “bifurcation site”: a point where the semiotic landscape has split into antagonistic basins.

This compounding is not random. It is structured by three mechanisms identified in linguistic anthropology:

Indexicality (Silverstein, 2003): Signs point to and create social contexts through structured “indexical orders.” First-order indexicals directly indicate contexts (“here,” “now”); second-order indexicals presuppose and construct social relationships, identities, and ideologies. Political discourse operates predominantly through higher-order indexicality, where word choices index social types that carry ideological freight. The phrase “undocumented immigrant” indexes a progressive characterological stance; “illegal alien” indexes a conservative one. The indexical meaning – the social-identity signal – may dominate the referential meaning for recipients attuned to it. When these indexical associations diverge across communities, the same utterance indexes opposed social identities, triggering evolved mechanisms of in-group solidarity and out-group suspicion (Lancaster, 2025, Section 1.3.5). Crucially, this indexical divergence is invisible to standard embedding spaces: both phrases co-occur with immigration-related tokens, and a distributional model will represent them as near-neighbors in semantic space, obscuring the indexical chasm between them.
Enregisterment (Agha, 2003): The social processes through which linguistic forms become associated with characterological figures – typified personas with associated values, stances, and social positions. Enregisterment is diachronic: it accumulates through repeated usage in community-specific contexts, reinforced by metadiscursive commentary (“when they say X, what they really mean is Y”). Political keywords undergo divergent enregisterment at accelerating rates in digital environments (Lancaster, 2025, Section 1.3.2): a term can acquire thick, community-specific characterological associations within weeks rather than the decades or centuries documented by historical linguistics. The term “woke” illustrates the phenomenon: originally enregistered within Black American discourse to index critical racial awareness, it underwent rapid re-enregisterment within conservative discourse to index performative social policing. By 2025, the same three-letter word activates fundamentally incompatible characterological figures depending on the interpreter’s community membership. For an LLM trained on corpora containing both enregisterments, the token “woke” sits in an embedding space that averages over both – a location that represents neither community’s actual usage.
Semiotic ideology (Irvine & Gal, 2000; Keane, 2003): Culturally specific assumptions about how signs relate to reality – what signs can represent, what interpretive processes are legitimate, what counts as evidence. Irvine and Gal identify three processes through which these ideologies operate: iconization (a linguistic feature associated with a group comes to be seen as depicting the group’s inherent nature), fractal recursivity (an opposition salient at one level is projected onto other levels, so that urban/rural maps onto educated/uneducated, elite/authentic), and erasure (facts inconsistent with the ideological representation are rendered invisible). When communities operate with incompatible semiotic ideologies, even shared commitment to “following the evidence” or “listening to experts” produces divergent conclusions, because the processes of evidential reasoning – what counts as a valid source, what constitutes a legitimate inference, what methodological commitments are transparent versus suspect – are themselves semiotically mediated. This is not epistemic failure; it is the normal operation of semiotic systems under conditions of ideological divergence.

Silverstein’s (1993) work on metapragmatic function identifies a capacity that is absent from current language models and central to this paper’s proposal: metapragmatic awareness – reflexive consciousness about how signs function, that one is interpreting, that interpretations are constructed, and that alternative interpretations exist.

Silverstein distinguishes three orders of indexical awareness, each building on the one below:

First-order (Perception): The basic presupposing link between a sign and its context. A speaker uses “here” and the listener identifies the location. A reader encounters “freedom” and retrieves a referential meaning. This is the level at which standard LLMs operate: they predict the next token based on contextual co-occurrence, which is functionally equivalent to first-order indexical processing.
Second-order (Cognition): Ideological construals of first-order links, regimented by social values. The interpreter recognizes not just what a sign refers to but what social type, stance, or identity it indexes. Using “freedom” in a particular syntactic frame with particular collocates signals membership in a political community. Second-order awareness involves recognizing that signs carry social valence – that language does social work beyond reference. Current LLMs capture second-order patterns implicitly (they can generate text that sounds conservative or progressive) but cannot represent the social-indexical structure that makes it so. The distinction between conservative and progressive registers exists in the model only as distributional clusters, not as identified social-indexical configurations.
Third-order (Reflection): Metapragmatic awareness of how discourse shapes interpretation itself. This is the capacity to observe one’s own interpretive frameworks as frameworks rather than as transparent access to reality – to notice that the word “radical” triggers alarm in oneself and to recognize that alarm as an effect of enregisterment within one’s community rather than an objective property of the sign. Third-order awareness does not dissolve interpretive commitments; it places them in a wider field of awareness. It is the capacity to inhabit an attractor basin while recognizing it as an attractor basin – to see the landscape from above rather than only from within.

This third-order capacity is what enables navigation of semiotic divergence without collapse into either relativism (“all interpretations are equally valid”) or dogmatism (“my interpretation is the only valid one”). Empirical evidence supports its efficacy: Pennycook et al. (2021) found that accuracy prompts – which activate a form of metapragmatic attention – reduced sharing of misinformation across partisan lines. Voelkel et al. (2022) documented that perspective-taking interventions produce modest but significant reductions in affective polarization. These findings suggest that activating reflexive awareness of one’s own interpretive processes changes the character of disagreement from identity threat to navigable difference.

Current LLMs possess no architectural analog of third-order metapragmatic capacity. They generate text from within the statistical landscape of their training data without any mechanism for observing that landscape’s structure, measuring its divergence across contexts, or adjusting their generative strategy based on the dynamical state of the semiotic environment they are participating in. The Reflexive Recurrent Module (Section 4.4) is designed to instantiate precisely this meta-observational capability as a differentiable computation: a GRU that processes divergence signals from the Metapragmatic Attention Heads and maintains a running state representing the model’s observation of its own interpretive dynamics.

Lancaster (2025) demonstrates, through an integration of Peircean semiotics with dynamical systems theory, that political polarization in algorithmically curated societies exhibits the structure of a supercritical pitchfork bifurcation. The argument proceeds in two stages: first, that interpretant chain divergence constitutes a symmetry-breaking process amenable to formal modeling; and second, that the pitchfork bifurcation’s normal form captures the specific qualitative features observed in empirical polarization data.

The normal form is:

where is the order parameter representing deviation from interpretive consensus – concretely, the degree to which the aggregate interpretant distribution for a contested sign (e.g., “freedom,” “justice,” “vaccine”) has diverged from a population-level mean – and is the control parameter representing the strength of algorithmic amplification: the degree to which mediation environments surface and reinforce content aligned with emerging interpretive paths.

The system’s behavior depends critically on :

For

: The origin

is the unique stable equilibrium. The linearization

yields eigenvalue

, so perturbations decay exponentially at rate

. Shared interpretive frameworks absorb divergence. A community encountering a contested sign may produce varied interpretants, but the variation regresses toward consensus.
At

: The critical point. The linearization has eigenvalue zero; the system undergoes a qualitative change in its equilibrium structure. Critical slowing down occurs: perturbations decay algebraically (

, yielding

) rather than exponentially, meaning the system becomes increasingly sensitive to small pushes – a formal analog of the observation that societies near the bifurcation threshold can be tipped by relatively minor events (a viral tweet, a contested video, a shock news cycle).
For

: The origin becomes unstable (linearization eigenvalue

). Two new stable equilibria emerge at

, each with basin of attraction covering half the axis. The system splits into opposed interpretive communities that are individually stable but collectively divided. The cubic term

provides saturation: the separation is finite (bounded by

) but self-reinforcing within each well.

This mathematical structure captures several empirically observed features of polarization:

Threshold character: Polarization is not a steady creep but a phase transition. Below critical amplification, shared meaning holds; above it, meaning fractures suddenly. This is consistent with the observation (documented by Bail et al., 2018; Boxell et al., 2017) that attitudinal divergence accelerated nonlinearly with the rise of personalized feeds rather than increasing at a constant rate proportional to internet adoption.
Symmetry breaking: A single interpretive community becomes two opposed communities. The pitchfork’s two tines are symmetric in the normal form, but Lancaster (2025, Appendix B.3) introduces an asymmetric extension

where the imperfection parameter

captures pre-existing structural asymmetries (e.g., differential institutional trust, historical grievance). When

, one tine is reached preferentially, producing the asymmetric polarization observed empirically (greater movement in one direction than the other).
Hysteresis: Once bifurcated, the system does not return to consensus merely by reducing

below the critical value. The subcritical pitchfork extension

exhibits explicit hysteresis: the bifurcated state persists even as the control parameter retreats below the forward transition threshold. This explains why “turning down the algorithm” does not immediately restore shared meaning; once interpretant chains have entrenched in antagonistic basins, the basins themselves resist dissolution.
Self-reinforcement: Each attractor basin generates interpretants that reinforce the basin, deepening the well and increasing the perturbation required to escape. This is the semiotic analog of positive feedback: within-basin interpretants are rewarded by social validation (likes, shares, group solidarity), which produces further interpretants aligned with the basin, which attracts further algorithmic amplification. The effective

experienced within a basin is higher than the platform-level

, because the community’s own semiotic activity contributes to the amplification.

Real semiotic systems are noisy. Lancaster (2025, Appendix B.4) extends the deterministic model with additive noise:

where is white noise with intensity . The stochastic version yields two phenomena relevant to AI design:

Critical slowing down as an early warning signal. As approaches the critical value from below, the system’s recovery time from perturbation increases. The autocorrelation of increases; the variance of fluctuations increases. These are detectable statistical signatures that the system is approaching bifurcation (Scheffer et al., 2009). A model trained to recognize these signatures could estimate proximity to the critical threshold from observed semiotic dynamics – precisely the function of the Bifurcation Estimation Network (Section 4.5).

Noise-induced transitions. Below but near the bifurcation threshold, noise can cause transient excursions into what would be a stable attractor basin above threshold. These transient excursions correspond to sporadic radicalization events: individuals or subgroups temporarily adopting extreme interpretive frameworks before returning to consensus. A model aware of the stochastic landscape can distinguish noise-induced transients from genuine bifurcation.

Standard LLM training on web corpora amounts to training on the output of a system that has already bifurcated. The model learns the statistical regularities of text produced within antagonistic attractor basins without representing the dynamical process that generated the bifurcation. It can reproduce text characteristic of either basin – indeed, it can switch between basins with a prompt – but it cannot model the relationship between basins, the bifurcation threshold, the control parameter that governs the transition, or the early warning signatures that indicate approaching criticality.

This is the fundamental inadequacy that semiotic-reflexive training addresses. The target of learning is not the content of attractor basins (what text sounds like within each community) but the dynamics of the attractor landscape: its topology, its bifurcation structure, the parameters that govern transitions between configurations, and the stochastic signatures that signal proximity to critical transitions. A model that learns these dynamics can do what current models cannot: estimate whether its own generative activity is pushing the semiotic system toward or away from bifurcation, and adjust accordingly.

Mangalam (2025) argues that the dominant metaphor in computational cognitive science – the Bayesian brain, which frames cognition as prior-likelihood-posterior updating over probability distributions – is empirically inadequate. The argument proceeds on three fronts.

First, temporal structure: Bayesian updating assumes that evidence arrives in discrete packets to be integrated with a prior. But neural dynamics are continuous, multiscale, and characterized by long-range temporal correlations (1/f noise, power-law scaling) that are inconsistent with the memoryless updating of ideal Bayesian observers. Motor control, perception, and language processing exhibit variability patterns that reflect deterministic chaos rather than stochastic sampling from posterior distributions.

Second, representational format: The Bayesian framework requires that the brain maintain and compute over explicit probability distributions. While neural populations can be interpreted as coding distributions (Pouget et al., 2013), the claim that this is what they are doing – rather than an observer’s post-hoc statistical description of their activity – is underdetermined by the evidence. The same population activity is equally consistent with dynamical systems interpretations in which distributions are epiphenomenal rather than causal.

Third, generality: Bayesian models achieve empirical adequacy only with task-specific priors and likelihoods engineered by researchers. When applied to the kind of open-ended, context-sensitive, culturally mediated meaning-making that constitutes natural language use, the Bayesian framework either becomes vacuous (any outcome can be accommodated by suitable choice of prior) or empirically refuted (specific predictions fail). Meaningfulness is not a matter of statistical confidence but of resonance within a dynamical landscape shaped by embodied interaction with a physical and social world.

The critique has direct implications for language model design. Standard autoregressive training optimizes a model to approximate the conditional distribution – operationally a Bayesian objective: the model learns to assign probability mass to tokens in proportion to their frequency in context. If cognition is not fundamentally Bayesian, then this objective captures the statistical shadow of meaning rather than its substance. Models optimized for distributional correctness produce outputs that are fluent and contextually appropriate at the token level but semantically ungrounded at the interpretive level: they inherit the co-occurrence structure of meaningful text without instantiating the dynamical processes that generate meaningfulness.

An alternative framework comes from dynamic field theory (DFT; Schöner & Spencer, 2015), which models cognitive processes as trajectories through continuous state spaces shaped by attractor landscapes. DFT has been applied to motor planning, spatial cognition, visual search, and word learning, achieving quantitative fits to developmental and behavioral data that rival or exceed Bayesian models while offering a fundamentally different account of what cognition is.

In the DFT framework, a cognitive state is represented as an activation field over a feature space , governed by dynamics of the general form:

\tau \dot{u}(\mathbf{x}, t) = -u(\mathbf{x}, t) + h + S(\mathbf{x}, t) + \int w(\mathbf{x} - \mathbf{x}') f(u(\mathbf{x}', t)) d\mathbf{x}' + \text{noise}

where is a time constant, is a resting level, is external input, is an interaction kernel (typically Mexican-hat: local excitation, surround inhibition), and is a sigmoidal output function. The interaction kernel is critical: it creates attractor landscapes through lateral dynamics. Peaks of activation self-stabilize through local excitation; competing peaks are suppressed through surround inhibition; the landscape configuration determines what interpretive states are accessible.

Mapping this to the semiotic domain:

Meaning is an emergent property of a trajectory’s convergence into an attractor basin – not a stored representation retrieved by key-matching but a dynamical process of settling into a self-sustaining activation pattern. The same sign can produce different meanings (different stable peaks) depending on the prior state of the field, the configuration of the landscape, and the noise present at the moment of encounter.
Understanding corresponds to deep attractor basins with strong convergence properties: basins that absorb perturbation and resist drift. Deep understanding corresponds to a landscape in which the meaning-basin is wide, steep-walled, and connected to neighboring basins through identifiable transition paths.
Confusion corresponds to shallow basins, chaotic regions, or landscapes with multiple competing attractors of similar depth, where trajectories fail to settle or oscillate between alternatives.
Learning corresponds to the reshaping of the attractor landscape through experience. Repeated encounters with a sign in a consistent context deepen its associated basin. Encounters across divergent contexts can widen a basin (developing flexible understanding), create a new basin (learning a new sense), or – critically – split a basin into two (the semiotic analog of bifurcation).

For LLM training, this framework suggests that the target representation is not a probability distribution over tokens but an attractor landscape over interpretive trajectories. The model should learn basins (what stable interpretive states exist), their depths (how robust they are to perturbation), their boundaries (where one interpretive community’s basin gives way to another’s), and the bifurcation parameters that govern transitions between landscape configurations. This is what the attractor embedding component () of the Semiotic Embedding Layer is designed to represent: each token’s position in a learned attractor landscape, not merely its position in distributional space.

The DFT framework connects naturally to the pitchfork bifurcation model of Section 2.2. The Mexican-hat interaction kernel in the field equation can be parameterized so that increasing the strength of long-range inhibition (analogous to increasing cross-community antagonism) drives the field through a symmetry-breaking transition: a single peak (consensus) becomes two peaks (polarized interpretation). This is exactly the pitchfork bifurcation, now grounded in a field-theoretic cognitive model rather than an abstract dynamical equation.

The connection provides theoretical depth: the pitchfork is not merely a convenient analogy borrowed from physics but emerges from a cognitive-dynamical model of how interpretive landscapes form and fracture. It also provides architectural guidance: the interaction kernel’s parameters (strength, width, balance between excitation and inhibition) map onto learnable parameters in the model’s attractor embedding space, giving the Bifurcation Estimation Network (Section 4.5) a concrete cognitive-dynamical interpretation.

Saussure (1916) established the principle of arbitrariness: no natural connection links signifier to signified. The sound-form /tri:/ (“tree”) bears no resemblance to the woody plant it designates; different languages use entirely different sound-forms for the same referent. This arbitrariness is the condition of possibility for linguistic diversity and creativity – but it is also the condition of possibility for semiotic drift. The same sign can mean different things in different communities precisely because nothing anchors it to a fixed referent. Arbitrariness is what makes the pitchfork possible: if signs were naturally tethered to their objects, interpretant chains could not diverge.

If all sign-object relations were arbitrary, semiotic drift would be unconstrained, and any attempt to ground model representations would face an infinite regress of conventions interpreting conventions. The semiotic landscape would be a flat plane with no privileged points – all basins equally shallow, all equilibria equally fragile. But Saussure’s own principle, taken without qualification, overstates the case. Not all sign-object relations are arbitrary, and the exceptions are not marginal – they are structurally significant.

The bouba/kiki effect, first observed by Köhler (1929) and named by Ramachandran and Hubbard (2001), demonstrates systematic cross-modal correspondences between auditory and visual features. When presented with a rounded shape and an angular shape and asked which is “bouba” and which is “kiki,” approximately 90-95% of respondents across studied populations assign “bouba” to the rounded shape and “kiki” to the angular shape.

The effect is not a single curiosity but a structured family of cross-modal correspondences operating along identifiable phonological dimensions:

Vowel roundedness: Back, rounded vowels (/u/, /o/) map to rounded shapes; front, unrounded vowels (/i/, /e/) map to angular shapes. The mapping follows the articulatory gesture: lip rounding during /u/ production mirrors visual roundedness.
Consonant manner: Sonorants (/m/, /n/, /l/) and approximants (/w/) map to rounded shapes; plosives (/k/, /t/, /p/) and fricatives (/s/, /z/) map to angular shapes. The mapping follows acoustic envelope: sonorants have smooth, continuous spectral envelopes; plosives have abrupt spectral transients that mirror visual angularity.
Voicing: Voiced consonants tend toward roundedness associations; voiceless consonants toward angularity, though this dimension is weaker and interacts with manner.
Fundamental frequency: Lower pitch maps to larger and rounder shapes; higher pitch to smaller and more angular shapes – a cross-modal correspondence between auditory frequency and visual spatial extent.

These correspondences are not learned conventions. Six lines of evidence establish their non-arbitrary character:

Cross-cultural robustness: The effect has been documented in English, Tamil, Korean, and Swahili speakers (Ramachandran & Hubbard, 2001; Imai et al., 2008), and in the Himba of northern Namibia – a remote population with minimal exposure to Western media or Latin script (Bremner et al., 2013). The Himba showed the shape-sound correspondence while differing from Westerners on shape-taste correspondences, indicating that the sound-shape mapping reflects a biological constant rather than cultural transmission.
Prelinguistic infants: Ozturk, Krehm, and Vouloumanos (2013) demonstrated sound-shape correspondences in four-month-old infants – before any language has been acquired, ruling out lexical learning as the mechanism. At four months, infants have not begun babbling, have produced no speech sounds, and have no productive or receptive vocabulary. The articulatory account cannot explain this performance because four-month-olds are not producing vowels.
Cross-species replication: Versace et al. (2023), published in Philosophical Transactions of the Royal Society B, demonstrated bouba/kiki-like cross-modal correspondences in domestic chicks (Gallus gallus domesticus) within one to three days of hatching. Chicks hearing rounded sounds preferentially approached rounded visual panels; chicks hearing sharp sounds preferentially approached angular panels. This is the single most important piece of evidence for the non-arbitrary character of the effect, because it rules out not only linguistic convention but postnatal learning, articulatory kinematics, and mammalian-specific mechanisms as sufficient explanations (see Section 2.4.3).
Writing system independence: The effect obtains in speakers of languages with non-Latin scripts (Tamil, Korean, Arabic), confirming that it is not an artifact of letter-shape associations.
Neural substrate: Neuroimaging studies (Peiffer-Smadja & Cohen, 2019) identify activation in the angular gyrus and superior temporal sulcus during cross-modal matching – regions associated with multimodal integration, not arbitrary association. The angular gyrus sits at the junction of auditory, visual, and somatosensory cortex, providing a neuroanatomical basis for cross-modal correspondence.
Gradient structure: The effect is not binary (bouba vs. kiki) but graded: nonsense words with varying proportions of rounded vs. angular phonological features produce corresponding gradations in shape-matching behavior (Nielsen & Rendall, 2011). This gradient structure suggests a continuous iconic mapping rather than a categorical convention.

The chick data from Versace et al. (2023) warrant detailed treatment because their primary value lies in what they eliminate.

Linguistic convention. Chicks have no language. They do not learn words, acquire phonological categories, or participate in communities of speakers who negotiate conventional meanings. The bouba/kiki effect in chicks cannot be a linguistic artifact.

Articulatory kinematics. Ramachandran and Hubbard (2001) proposed that the effect derives from correspondences between mouth shape during speech production and visual form: the lips round for “bouba,” the tongue makes sharp contact with the palate for “kiki.” This articulatory hypothesis is inapplicable to chicks. Chicks do not have lips. They have beaks. They do not round their mouths or tense their tongues to produce speech-like sounds. Yet the mapping appears. The articulatory account may capture a real mechanism contributing to the adult human effect, but it cannot be the foundational explanation. The grounding runs deeper than articulation: it is prenatal, pre-motor, and pre-linguistic.

Postnatal learning. The effect appears in day-old chicks – organisms with at most twenty-four hours of postnatal auditory experience. The mapping is present at the moment the chick encounters the experimental stimuli for the first time. Whatever produced it did so before the chick left the egg, or within the first hours of life, before any plausible learning mechanism could operate at this level of complexity.

Mammalian-specific mechanisms. The last common ancestor of birds and mammals lived approximately 310 million years ago, during the Carboniferous period. Any mechanism shared between chicks and humans is a deeply conserved feature of vertebrate neurobiology, not a recent mammalian innovation. The mapping is either homologous (retained from the common ancestor through 310 million years of independent evolution, perhaps tied to the basic architecture of multisensory integration in the midbrain superior colliculus/optic tectum, which is homologous across all vertebrates) or convergent (independently evolved in response to a shared developmental constraint: enclosed embryonic development in environments that function as low-pass acoustic filters). Either way, it is not a product of human linguistic evolution. Either way, it predates language by hundreds of millions of years.

Lancaster (2026b) develops the mechanism underlying these cross-species correspondences. The argument rests on established facts of developmental biology.

Prenatal auditory sensitivity. Chick embryos develop functional auditory sensitivity by incubation day 12 of a 21-day incubation period (Gottlieb, 1971; Rogers, 1995). Human fetuses develop cochlear function by approximately gestational week 18; by week 25, the auditory system supports learning, as demonstrated by newborn preferences for the mother’s voice (DeCasper & Fifer, 1980) and for stories read aloud during pregnancy (DeCasper & Spence, 1986).

The low-pass filter. Both the eggshell and the uterus function as low-pass acoustic filters. They transmit low-frequency sounds with relatively little attenuation and progressively attenuate higher frequencies. For the human uterus, Abrams, Gerhardt, and Peters (1998) measured attenuation of approximately 20-30 dB at frequencies above 500 Hz, with increasing attenuation at higher frequencies. Low-frequency sounds below 250 Hz pass through with relatively little loss. The eggshell follows the same physics. The consequence is that embryonic auditory environments are dominated by smooth, periodic, low-frequency waveforms – heartbeat, blood flow, respiration – while sharp, high-frequency transients are anomalous departures from this acoustic baseline.

From acoustic asymmetry to cross-modal mapping. The hypothesis is that the embryo’s auditory system is calibrated by this asymmetric environment to treat smooth, continuous, low-frequency waveforms as the default state (normal, safe, approach-worthy) and sharp, abrupt, high-frequency transients as deviations (anomalous, potentially threatening, avoidance-worthy). This calibration requires only that the developing auditory system be shaped by the statistics of its input – a principle operative across all sensory modalities in all species studied (Sanes & Bao, 2009). The cross-modal extension follows from the architecture of early sensory processing: the superior colliculus (mammals) or optic tectum (birds) integrates input from multiple sensory modalities, including audition and vision (Knudsen, 2002; Stein & Meredith, 1993). Multisensory integration is a basic feature of vertebrate brain organization present in amphibians, fish, and birds as well as mammals. The prenatal acoustic asymmetry becomes a cross-modal template: smooth auditory contours activate the same neural populations as smooth visual contours, and sharp auditory contours activate the same populations as angular visual contours. The bouba/kiki effect is the behavioral expression of this shared representation.

Peircean classification. In Peirce’s taxonomy, this mapping constitutes a hypoicon (CP 2.276): a sign that represents its object through shared quality rather than convention or causal connection. The smooth waveform does not arbitrarily stand for safety; it shares the qualitative character of the stable biological processes that produce it – continuity, periodicity, low variance. The resemblance is structural, not metaphorical. This is not yet full semiosis in the Peircean sense of unlimited interpretant chains. It is a pre-semiotic foundation: a layer of non-arbitrary cross-modal association grounded in shared biology that precedes and enables the construction of conventional sign systems. The symbolic capacity – the ability to establish and maintain arbitrary associations through community convention – is built on top of this iconic substrate, not instead of it. The bouba/kiki effect reveals the stratum underneath: the ground floor that was already in place when the building of language began.

The bouba/kiki effect is the most studied instance of a broader phenomenon: phonosemantics or sound symbolism – systematic correspondences between phonological form and meaning that exist alongside (and in tension with) the Saussurean arbitrary sign. Research has documented:

Size symbolism: Front high vowels (/i/) tend to be associated with smallness across languages; back low vowels (/a/) with largness (Sapir, 1929; Ohala, 1994). This is reflected in cross-linguistic patterns: diminutive morphemes disproportionately use high front vowels.
Motion symbolism: Reduplication and vowel alternation patterns correlate with motion types across unrelated languages (Dingemanse, 2012).
Affective symbolism: Phonological features correlate with emotional valence: rounded, sonorant-heavy forms associate with positive affect; harsh, plosive-heavy forms with negative affect (Adelman et al., 2018).
Ideophones: A class of words found in the majority of the world’s languages that use sound-symbolic relationships iconically – onomatopoeia being the familiar subset, but the phenomenon extending to non-auditory sensory domains (Dingemanse, 2012).

These phenomena collectively define a phonosemantic feature space: a low-dimensional manifold in which phonological form and perceptual/affective meaning are non-arbitrarily linked. This space is not a replacement for the arbitrary sign but a substrate beneath it – a ground floor of iconic correspondence on which the upper stories of conventional meaning are built. Crucially, this substrate is shared across human populations because it is grounded in universal features of the sensorimotor system rather than in linguistically specific conventions.

The critical question for computational modeling is whether these cross-modal correspondences are accessible to learning systems or are confined to biological neural architectures. Recent evidence answers decisively: multimodal AI architectures already recover bouba/kiki-like structure from their training distributions.

CLIP (Radford et al., 2021), trained on 400 million image-text pairs via contrastive learning, develops joint visual-linguistic representations. Thompson and Lupyan (2023) demonstrated that CLIP’s embedding space exhibits sound-symbolic structure: nonsense words with bouba-like phonology are embedded closer to rounded visual concepts, and nonsense words with kiki-like phonology closer to angular visual concepts, without any explicit training on sound symbolism. The correspondences emerge from the statistical regularities of naturally paired visual-linguistic data.

This finding has three implications for the SRT architecture:

Learnability: Cross-modal iconic correspondences are not epiphenomenal or confined to online behavioral paradigms; they are structured regularities in multimodal data distributions that sufficiently expressive architectures recover automatically. This validates the proposal to include an iconic grounding subspace in the model’s embedding layer.
Initialization: CLIP-derived embeddings can serve as initialization for the iconic grounding subspace, providing a pre-trained representation of cross-modal correspondence structure. This is more efficient than learning iconic structure from scratch and ensures that the model begins training with access to the ground floor of non-arbitrary semiotic mapping.
Continuous structure: CLIP recovers graded sound-symbolic structure, not just a binary bouba/kiki distinction. This supports the representation of iconic grounding as a continuous feature space (the phonosemantic feature space of Section 2.4.5) rather than a categorical label.
Convergence with biology: The convergence between CLIP’s learned representations and the prenatal biological mechanism (Section 2.4.4) is itself significant. It suggests that the bouba/kiki structure reflects genuine cross-modal regularities in the world that any system – biological or artificial – will discover given sufficient multimodal data. The prenatal environment provides one access route; web-scale image-text corpora provide another.

We now state the central claim of this subsection formally.

Claim: Cross-modal iconic correspondences function as low-dimensional attractors in the semiotic state space. They provide fixed points around which interpretive trajectories can stabilize – anchors resistant to the drift that afflicts purely conventional signs.

The argument proceeds through three steps:

Step 1: Characterize basin depth as a function of grounding type. In the DFT-inspired attractor landscape (Section 2.3.2), the depth of an attractor basin determines its stability – the magnitude of perturbation required to dislodge a trajectory from the basin. We propose that basin depth is a function of grounding type:

Arbitrary (symbolic) signs occupy basins whose depth depends entirely on frequency and consistency of usage within a community. Their stability derives from convention – the accumulated weight of habitual association. Under bifurcation, when the community fractures, convention fractures with it: the same sign occupies different basins in different communities, and neither basin is anchored to anything outside the community’s own interpretive history. Basin depth is community-relative and fragile under bifurcation.
Iconic signs (including cross-modal correspondences) occupy basins whose depth has a component independent of community convention: the depth contributed by sensorimotor invariants shared across vertebrate populations. The bouba/kiki mapping is not deep because a community has agreed it is so; it is deep because the auditory-visual correspondence is calibrated by prenatal acoustic experience (Section 2.4.4) and expressed through the conserved multisensory integration architecture of the vertebrate midbrain. The cross-species evidence (Section 2.4.3) confirms that this depth is not species-specific: chicks and humans share it despite 310 million years of independent evolution. Basin depth has a universal component that persists across communities, across species, and resists bifurcation.

Step 2: Formalize depth decomposition. Let denote the effective basin depth for sign . We decompose:

D(s) = D_{\text{conv}}(s, c) + D_{\text{iconic}}(s)

where is the community-dependent conventional depth (a function of usage frequency and consistency within community ), and is the community-independent iconic depth (a function of the sign’s phonosemantic, visual, or cross-modal grounding). For a purely arbitrary sign (e.g., “freedom”), and all stability derives from convention. For a sign with strong iconic grounding (e.g., an ideophone like “splash” or a sign with strong bouba/kiki correspondence), provides a stability floor.

Step 3: Show that iconic depth resists bifurcation. Under bifurcation (increase in ), the conventional component fractures: it splits into and for the two community basins, with each potentially drifting away from the other. But the iconic component is invariant under bifurcation because it does not depend on community membership. The iconic floor provides a lower bound on the gravity that holds interpreters near a shared meaning – a reference point that persists even as conventional interpretations diverge.

In practice, most politically contested signs are purely or nearly purely arbitrary (), which is why they are susceptible to bifurcation. The SRT architecture’s strategy is not to make all signs iconic (which would contradict basic linguistics) but to leverage the iconic grounding subspace as a regularization mechanism: interpretant embeddings are encouraged by the training loss to maintain proximity to the iconic subspace where possible, providing a stabilizing force that partially counteracts semiotic drift. The strength of this regularization is modulated by the hyperparameter in the training loss (Section 5.2.3).

The bouba/kiki effect is our primary case study, but the architectural role of iconic grounding extends beyond sound-shape correspondence. The generalized icon hypothesis holds that any non-arbitrary sign-object relation – any correspondence grounded in resemblance, embodiment, or shared perceptual structure rather than pure convention – can function as an attractor anchor in the semiotic landscape. Candidate iconic anchors include:

Spatial metaphors for abstract concepts: “Prices are up,” “morale is down,” “the argument fell apart.” These metaphors are grounded in embodied spatial experience (Lakoff & Johnson, 1980) and show cross-linguistic regularities suggesting non-arbitrary structure.
Affective prosody: The mapping between vocal pitch/tempo patterns and emotional states is cross-culturally robust and prelinguistically accessible (Fernald, 1989).
Gestural iconicity: Sign languages exhibit systematic iconicity in which the form of signs resembles their referents (Taub, 2001), providing a rich source of iconic grounding data for multimodal models.
Diagrammatic iconicity: The structural correspondence between relational form and relational meaning (Peirce’s “diagrams”) – e.g., the iconic relationship between word order and temporal order of events in many languages.

The SRT’s iconic grounding subspace is designed to be extensible: initialized with bouba/kiki-derived phonosemantic features, it can incorporate additional iconic dimensions as grounding data becomes available. Each additional dimension adds a low-dimensional attractor that further stabilizes the semiotic landscape against unconstrained drift.

This section positions the SRT framework relative to five bodies of existing work. For each, we identify the specific contribution it makes toward the problem of semiotic divergence in AI systems, specify the precise gap it leaves open, and state how the SRT addresses that gap. Table 1 provides a summary comparison; the subsections that follow develop the analysis.

Table 1: Positioning the SRT Relative to Existing Approaches

The alignment literature addresses the problem of making LLM outputs conform to human values and intentions. We examine the three dominant paradigms and identify a structural limitation shared by all.

RLHF. Reinforcement Learning from Human Feedback (Christiano et al., 2017; Ouyang et al., 2022) trains a reward model on pairwise human preference judgments over model outputs, then fine-tunes the LLM to maximize predicted reward via proximal policy optimization (PPO; Schulman et al., 2017). InstructGPT (Ouyang et al., 2022) demonstrated that RLHF could align a 1.3B-parameter model to outperform a 175B-parameter unaligned model on human evaluations. The approach is now standard practice at OpenAI, Anthropic, Google DeepMind, and Meta.

However, RLHF’s preference signal is aggregated and implicit. Human annotators express preferences between outputs without articulating the interpretive frameworks – the community-specific indexical associations, enregisterment patterns, and semiotic ideologies – that generate those preferences. The reward model learns a function from output features to scalar reward without representing the structure of preference disagreement. When annotators from different interpretive communities disagree (as they systematically do on politically charged content), the reward model averages over the disagreement or reflects the majority annotator demographic, producing what Casper et al. (2023) characterize as a “least objectionable” policy that satisfies no community’s actual interpretive norms.

From the SRT perspective, RLHF performs trajectory adjustment within the learned attractor landscape without modifying the landscape itself. The model’s representational space – the embedding geometry that determines what distinctions the model can draw – remains unchanged. RLHF adjusts which regions of that space the model visits during generation, but it cannot create representational distinctions that the pre-trained landscape does not contain. If the pre-trained model conflates two interpretive communities’ uses of “justice” into a single embedding cluster, no amount of preference fine-tuning will enable the model to distinguish them; the representational substrate lacks the degrees of freedom.

Constitutional AI. Constitutional AI (Bai et al., 2022) addresses the scalability limitation of human annotation by having the model self-critique its outputs against a set of explicit principles (the “constitution”). The model generates, critiques, revises, and then trains on the revised outputs. This introduces a form of reflexivity – the model evaluates its own outputs – but the reflexivity operates at the content level (”does this output violate principle X?”) rather than at the semiotic level (”what interpretant chain does this output activate, and how does that chain differ across communities?”). The constitution provides fixed rules for evaluating surface properties of text; it does not provide machinery for modeling the dynamic, community-dependent interpretive processes that determine whether a text is beneficial or harmful.

DPO. Direct Preference Optimization (Rafailov et al., 2023) eliminates the separate reward model by deriving a closed-form loss that directly optimizes the policy from preference data. This is a methodological simplification, not a conceptual advance from the semiotic perspective: the underlying representation of preference remains a scalar comparison between output pairs, with no representation of the interpretive structure that generates preferences.

The shared structural limitation. All three paradigms intervene after the model has been pre-trained on the bifurcated semiotic landscape of web corpora. The intervention is behavioral, not representational: it changes what the model does (which outputs it generates) without changing what the model represents (the semiotic structure of its embedding space). The SRT intervenes at the representational level, training the model to encode the triadic sign structure, interpretant chain dynamics, and bifurcation topology as first-class objects in its learned representation space. This makes the semiotic landscape inspectable and navigable rather than merely habitable.

Debiasing methods address systematic associations in model representations that disadvantage particular social groups. The literature has developed along three axes, each with a characteristic semiotic limitation.

Geometric debiasing. Bolukbasi et al. (2016) demonstrated that word embeddings encode gender stereotypes as linear directions and proposed removing the gender component from non-definitionally gendered words via projection onto the null space of the gender direction. Subsequent work extended this to racial (Manzini et al., 2019) and intersectional (Crenshaw, 1989; Guo & Caliskan, 2021) biases. The key assumption is that bias is a geometric property of the embedding space – a direction or subspace that can be identified and removed.

From the semiotic perspective, this assumption misidentifies the phenomenon. Bias is not a direction in embedding space; it is the trace left by divergent interpretant chains that have sedimented into the training distribution. The statistical co-occurrence of “nurse” with female pronouns reflects not a geometric accident but the accumulated output of communities whose interpretive frameworks link nursing to femininity through chains of indexical association, enregisterment, and semiotic ideology. Removing the projection onto the gender direction eliminates the symptom without addressing the mechanism: the model loses the ability to exhibit the bias but retains the same representational structure that would regenerate it given additional exposure to the generating distribution. Gonen and Goldberg (2019) confirmed this empirically, showing that debiased embeddings retain cluster structure that allows recovery of the original bias.

Data-level debiasing. Counterfactual data augmentation (CDA; Zhao et al., 2018; Lu et al., 2020) modifies training data by swapping demographic markers (replacing “he” with “she” and vice versa) to create balanced distributions. This treats bias as a distributional property – a skew in token frequencies that can be corrected by rebalancing. The semiotic limitation is that CDA operates on the representamen dimension only: it modifies sign vehicles without attending to the interpretant chains and object relations that give those sign vehicles their social meaning. Swapping pronouns does not swap the social worlds – the indexical, enregistered, ideologically mediated interpretive contexts – in which those pronouns function. The swap creates distributionally balanced but semiotically incoherent training examples.

Representation engineering. More recent work (Zou et al., 2023; Li et al., 2024) identifies “concept directions” in model activation space – linear directions corresponding to concepts like honesty, harmfulness, or political valence – and intervenes on activations during inference to steer model behavior. This is closer to the SRT’s approach in that it operates on internal representations rather than outputs. However, it identifies concepts as static directions rather than as dynamical configurations with community-dependent interpretant structures. The same concept direction (e.g., “harmfulness”) is assumed to have a single correct orientation, when in fact harmfulness is precisely the kind of contested sign whose interpretant diverges across communities.

The SRT reframes bias not as a geometric or distributional property to be removed but as a fixed-point attractor in a dynamical landscape whose topology should be learned and made navigable. The model should know that it has a bias – should represent the attractor structure that constitutes the bias – rather than having the bias silently excised from its representational space.

Computational studies of polarization have documented the phenomena that the SRT framework aims to address, but they model these phenomena at the social-system level rather than embedding awareness of them into language model architectures.

Opinion dynamics models. The DeGroot model (DeGroot, 1974) and its extensions model opinion formation as iterative averaging over a social network, demonstrating convergence to consensus under connectivity conditions. The bounded confidence model (Hegselmann & Krause, 2002) restricts influence to agents within a confidence threshold of each other, producing fragmentation into opinion clusters when the threshold is sufficiently small – a form of bifurcation. Schweighofer et al. (2020) develop an agent-based model with reinforcement learning agents whose influence functions produce pitchfork-like transitions, demonstrating that the bifurcation structure Lancaster (2025) identifies in semiotic systems also emerges in abstract opinion dynamics. These models formalize the macro-level dynamics we address but treat agents as opinion vectors rather than semiotic interpreters; they model what happens to opinions but not to meanings.

Network analysis. Conover et al. (2011) analyzed the retweet and mention networks of political communication on Twitter, finding extreme structural polarization in retweet networks (near-complete separation into partisan clusters) and less polarization in mention networks (cross-partisan engagement occurs, but predominantly through antagonistic interaction). Barberá et al. (2015) developed ideological scaling from Twitter network structure, enabling measurement of polarization dynamics over time. Garimella et al. (2018) proposed algorithms for detecting and quantifying controversy in social networks. This work provides empirical ground truth for the bifurcation dynamics the SRT is designed to model: the network structures document the consequences of interpretant chain divergence in the digital semiotic environment.

Filter bubbles and echo chambers. Pariser (2011) introduced the “filter bubble” concept; Sunstein (2001, 2017) analyzed “echo chambers” and the “law of group polarization” (the tendency of like-minded groups to move toward more extreme positions through deliberation). Bail et al. (2018) conducted a randomized controlled trial on Twitter, finding that exposure to opposing political views did not reduce polarization and in some cases increased it – a finding consistent with the attractor model, which predicts that perturbation insufficient to escape a basin will be absorbed and can trigger deepening of the basin through counter-reaction.

Algorithmic amplification studies. Huszár et al. (2022) analyzed Twitter’s algorithmic timeline and found that it amplified politically right-leaning content in most of the six countries studied, while amplifying left-leaning content in some. Guess et al. (2023) conducted randomized experiments on Facebook and Instagram during the 2020 U.S. election, finding that removing algorithmic curation reduced political news exposure and engagement but did not significantly change measured attitudes or polarization over the study period – a finding that the bifurcation model can accommodate through hysteresis: once bifurcated, the system retains its attractor structure even when the amplification parameter is temporarily reduced.

The SRT framework draws on this entire body of work but addresses a distinct problem: not modeling polarization in social systems but embedding awareness of polarization dynamics into a language model’s representational and generative architecture. Where existing work asks “how does polarization arise and propagate?” the SRT asks “how should a language model represent and navigate the semiotic landscape that polarization has produced?”

Vision-language models have demonstrated that cross-modal correspondences emerge from training on paired multimodal data, providing both inspiration and initialization for the SRT’s iconic grounding subspace.

Contrastive multimodal learning. CLIP (Radford et al., 2021) trains separate image and text encoders with a contrastive loss that aligns paired image-text representations in a shared embedding space. Trained on 400 million image-text pairs from the internet, CLIP develops rich cross-modal structure: it can match images to textual descriptions, classify images from text prompts, and – as Thompson and Lupyan (2023) demonstrated – recover bouba/kiki-like sound-symbolic correspondences without explicit training on sound symbolism. SigLIP (Zhai et al., 2023) and related variants improve the contrastive learning objective and scale to larger datasets, further enriching the recovered cross-modal structure.

Generative multimodal models. DALL-E (Ramesh et al., 2021) and its successors (Imagen, Stable Diffusion) generate images from text descriptions, demonstrating that the text-to-image mapping can be learned at high fidelity. These models implicitly encode iconic relationships: the generated image resembles the described content, creating an icon in the Peircean sense. However, the resemblance is mediated by the training distribution’s conventions about what visual features correspond to which textual descriptions – conventions that are community-specific and subject to the same semiotic dynamics the SRT addresses. A model trained predominantly on Western image-text pairs will encode Western visual conventions as if they were universal iconic correspondences.

Visually grounded language models. Flamingo (Alayrac et al., 2022), LLaVA (Liu et al., 2023), and GPT-4V (OpenAI, 2023) integrate visual processing into language model architectures, enabling multimodal dialogue and reasoning. These models demonstrate that visual grounding improves language understanding on certain tasks, consistent with the hypothesis that iconic (resemblance-based) information supports symbolic (convention-based) processing.

The SRT extends these findings in a specific direction: it positions cross-modal grounding not as a general representational enrichment but as a targeted stabilization mechanism within the semiotic embedding space. The iconic grounding subspace () is not a general-purpose multimodal representation; it is a low-dimensional manifold of non-arbitrary correspondences that functions as an attractor anchor, providing stability against the drift that afflicts purely conventional signs (Section 2.4.7). CLIP-derived embeddings provide initialization for this subspace, but the training objective – the iconic grounding regularization loss (Section 5.2.3) – shapes it specifically for semiotic stabilization rather than for image-text matching.

Computational semiotics and the language grounding debate provide the most direct intellectual antecedents for the SRT, though existing work in this area remains predominantly theoretical.

Formal computational semiotics. Tanaka-Ishii (2010) provides a comprehensive treatment of semiotics for information science, mapping Peircean and Saussurean frameworks onto computational concepts. Andersen and Hasle (2002) develop formal models of sign processes using process algebra, enabling simulation of semiotic dynamics. These works establish the formal foundations but do not engage with modern deep learning architectures or the empirical realities of training on web-scale data. The SRT bridges this gap by implementing semiotic concepts (triadic sign structure, interpretant chain dynamics, iconic grounding) as differentiable components within a transformer architecture (Section 4).

The octopus test and meaning from form. Bender and Koller (2020) argue, via an influential thought experiment (the “octopus test”), that a language model trained on form alone cannot learn meaning because meaning requires grounding in communicative intent and the non-linguistic world. Their argument aligns precisely with the Peircean analysis: form corresponds to the representamen dimension; meaning requires the interpretant and object dimensions that form alone does not determine. The SRT’s response is not to dispute Bender and Koller’s theoretical claim (which we accept) but to propose architectural mechanisms that introduce interpretant and object structure into the model’s representational space: the semiotic embedding decomposition (Section 4.2) represents the triadic structure, the metapragmatic attention heads (Section 4.3) trace interpretant chain dynamics, and the iconic grounding subspace (Section 4.2) provides non-arbitrary object-level grounding.

Bender et al.’s (2021) “stochastic parrots” critique extends the argument to large-scale models, emphasizing that increased scale does not cross the meaning gap – a position that has generated substantial debate (Piantadosi, 2023; Mahowald et al., 2024). The SRT framework sidesteps the scale debate: its claim is not that sufficient scale will produce meaning but that architectural modification – specifically, the addition of semiotic-reflexive components – is required regardless of scale.

Hermeneutics of transformer internals. Elhage et al. (2021, 2022) and the mechanistic interpretability program investigate what transformers learn by analyzing individual neurons, attention patterns, and circuits. This work has identified interpretable features (induction heads, syntax-sensitive neurons, fact-recall circuits) that suggest transformers develop structured internal representations rather than opaque statistical associations. From the SRT perspective, mechanistic interpretability provides tools for verifying that semiotic-reflexive training produces the intended representational structure: if the SEL successfully decomposes embeddings into triadic components, this should be detectable through probing classifiers and representation analysis (Section 6). However, mechanistic interpretability is diagnostic rather than prescriptive – it reveals what models have learned but does not propose what they should learn. The SRT provides the prescriptive framework.

AI semiotics and agency. Floridi and Chiriatti (2020) analyze GPT-3 through the lens of information philosophy, arguing that LLMs are “semantic reproducing engines” that manipulate signs without understanding them. Kockelman (2024, 2025) develops a more nuanced account of distributed semiotic agency, arguing that algorithms participate in semiotic processes as interpretant-generating mechanisms even without conscious intention. Lancaster (2025, Section 1.3.3) extends this analysis to characterize LLMs as agents that produce representamena whose immediate objects are coherent but whose dynamic objects are systematically underdetermined. The SRT does not claim to solve the philosophical question of machine understanding; it claims that explicit semiotic architecture enables the model to navigate semiotic complexity more effectively than models that lack such architecture, regardless of whether this navigation constitutes “understanding” in the philosophical sense.

This section specifies the Semiotic-Reflexive Transformer (SRT) architecture at a level of detail sufficient for implementation. We present each component’s theoretical motivation, formal specification, implementation considerations, and integration into the overall system. Figure 1 (conceptual) depicts the data flow; Table 2 summarizes component dimensionalities and parameter counts at the 7B scale.

Figure 1 (Conceptual Data Flow):

Table 2: Component Dimensionalities (7B-Parameter Scale)

The semiotic extensions add approximately 8.3% overhead, concentrated in the SEL.

The SRT augments a standard autoregressive transformer with four architectural extensions, each grounded in a specific theoretical requirement identified in Section 2:

Semiotic Embedding Layer (SEL) → Section 2.1 (triadic sign): Represents tokens as nodes in a triadic sign structure with associated interpretant and attractor metadata, enabling the model to disentangle the representamen, object, and interpretant dimensions that standard embeddings conflate.
Metapragmatic Attention Heads (MAH) → Section 2.1.2 (interpretant chains): Specialized attention mechanisms that track interpretant chain divergence across positions and contexts, operationalizing Kockelman’s formalization of interpretant chains as dynamical trajectories.
Reflexive Recurrent Module (RRM) → Section 2.1.3 (metapragmatic awareness): A gated recurrent unit maintaining a running meta-observation of the model’s own interpretive trajectory – an architectural analog of Silverstein’s third-order metapragmatic awareness.
Bifurcation Estimation Network (BEN) → Section 2.2 (pitchfork bifurcation): A feedforward network that estimates the effective amplification parameter for the current context and generates modulation signals, operationalizing Lancaster’s (2025) bifurcation model as a differentiable computation.

Three design principles govern the architecture:

Principle 1: Additive augmentation. All semiotic components are additive to the base transformer rather than replacing existing components. This ensures that (a) the SRT can be initialized from pre-trained transformer checkpoints, inheriting their distributional knowledge; (b) the semiotic components can be trained while the backbone is frozen (parameter-efficient adaptation via LoRA or similar); and (c) ablation studies can remove individual components to isolate their contributions.

Principle 2: Differentiable end-to-end. All components are specified as differentiable operations, enabling joint training with standard backpropagation. No component requires discrete decisions, reinforcement learning, or non-differentiable operations (though RL can be used in fine-tuning; Section 5.3).

Principle 3: Inspectable representations. The triadic decomposition, divergence signals, meta-observation state, and bifurcation estimates are designed to be human-interpretable – probeable by linear classifiers, visualizable as trajectories, and describable in terms of the semiotic theory that motivates them. This supports not only model understanding but also the metapragmatic reflexivity that is the architecture’s central goal: a model whose semiotic processing is opaque even to its own probing mechanisms cannot function as a semiotic steward.

Standard token embeddings map discrete tokens to continuous vectors optimized for distributional similarity. The resulting space captures co-occurrence structure but conflates the three dimensions of Peircean signification: the same embedding must simultaneously encode what the token is (representamen), what it refers to (object), and what effects it produces (interpretant). These dimensions may point in different directions – “freedom” as a token (representamen) has stable distributional properties, but its referent (object) and the responses it elicits (interpretant) vary systematically across communities.

The conflation is not merely a theoretical inelegance; it has practical consequences. When a model finetunes on data from Community A, the embedding for “freedom” shifts to reflect Community A’s interpretant. But this shift also modifies the representamen and object components, which should remain stable across communities. The result is that the model cannot represent the same sign producing different interpretants in different communities – the basic phenomenon of semiotic divergence – because it lacks the representational degrees of freedom to separate what varies (the interpretant) from what is shared (the representamen and, partially, the object).

The SEL provides these degrees of freedom by decomposing each token’s representation into four explicitly structured subspaces.

For each token in the input sequence, the SEL produces a composite embedding:

\mathbf{e}_i = [\mathbf{e}_i^R \| \mathbf{e}_i^O \| \mathbf{e}_i^I \| \mathbf{e}_i^A]

where denotes concatenation and:

is the representamen embedding – the distributional identity of the token as a sign vehicle. This is initialized from a pre-trained embedding table (e.g., from a standard transformer checkpoint) and captures what the token is as a linguistic form: its distributional neighborhood, syntactic roles, and morphological relationships. is expected to be relatively stable across contexts and communities: the word “freedom” has the same phonological form, the same morphological structure, and largely the same syntactic distribution regardless of who uses it.

is the object embedding – a representation of the sign’s referential target, learned from grounding data. This includes:

- Entity-linked features derived from knowledge graph embeddings (Wikidata, ConceptNet) when the token refers to a named entity or concrete concept.
- Perceptual features derived from multimodal pre-training (CLIP) when the referent has sensory properties.
- The iconic grounding subspace

\mathbf{e}_i^{O,\text{icon}} \in \mathbb{R}^{d_{\text{icon}}}

- (detailed in Section 4.2.3), which encodes non-arbitrary cross-modal correspondences. The object embedding represents what the sign points to – but note that the dynamic object (what the sign actually relates to in the world) may differ from the immediate object (what the sign presents as its referent). This gap is especially wide for politically contested signs, where the immediate object (“freedom as policy”) may be starkly different across communities even though both communities use the same representamen.
is the interpretant embedding – a representation of the interpretive effects the sign produces, parameterized by community context. This is the most dynamic component: it captures what happens when the sign enters an interpretive community, what associations it triggers, what identities it indexes, what emotional responses it elicits. The interpretant embedding is context-dependent: it is generated by a learned function that conditions on the representamen and a community-context vector:

\mathbf{e}_i^I = f_I(\mathbf{e}_i^R, \mathbf{c}; \theta_I)

where specifies the interpretive community (see Section 4.2.4 for how is determined) and is a two-layer MLP:

f_I(\mathbf{e}_i^R, \mathbf{c}; \theta_I) = \mathbf{W}_2 \cdot \text{ReLU}(\mathbf{W}_1 [\mathbf{e}_i^R \| \mathbf{c}] + \mathbf{b}_1) + \mathbf{b}_2

with , , hidden dimension , and parameters .

\mathbf{W}_1 \in \mathbb{R}^{d_h \times (d_R + d_c)}

\mathbf{W}_2 \in \mathbb{R}^{d_I \times d_h}

\theta_I = \{\mathbf{W}_1, \mathbf{b}_1, \mathbf{W}_2, \mathbf{b}_2\}

is theattractor embedding– a low-dimensional representation of the sign’s position in the semiotic attractor landscape. This encodes:

Basin membership: which attractor basin(s) the sign currently occupies (representable as a soft assignment over a learned set of

basin prototypes).
Basin depth: the stability of the current assignment (related to the

decomposition of Section 2.4.7).
Bifurcation proximity: the distance of the sign from the nearest bifurcation boundary in the attractor landscape.

The attractor embedding is computed from the other three components:

\mathbf{e}_i^A = g_A(\mathbf{e}_i^R, \mathbf{e}_i^O, \mathbf{e}_i^I; \theta_A)

where is a learned projection (single linear layer + layer normalization) that maps the triadic structure to the attractor summary space. This ensures that the attractor embedding reflects the relationship among the three Peircean dimensions rather than any single dimension.

The total embedding dimension is . We set equal allocation: . At the 7B parameter scale with , each subspace has dimension 1024.

Within the object embedding , a designated subspace encodes iconic grounding features. We set (at 7B scale), reserving one-quarter of the object embedding for non-arbitrary correspondences. The iconic subspace has the following structure:

Phonosemantic features (). Each token’s phonological form is mapped to a feature vector along the dimensions identified in Section 2.4.2:

These features are extracted by a frozen phonological encoder (a small pretrained model, ~10M parameters, mapping character sequences to phonological feature vectors) and projected into the iconic subspace. For tokens that are not natural-language words (punctuation, special tokens), the phonosemantic features are set to a learned default vector.

Initialization from CLIP. The remaining dimensions () encode cross-modal correspondences derived from CLIP embeddings. For each vocabulary token, we extract the CLIP text embedding, project it through a learned linear map , and use this as initialization for the non-phonosemantic iconic features. This captures visual-linguistic correspondences beyond the phonosemantic (e.g., the association between “sharp” and angular visual features, between “soft” and rounded visual textures).

\mathbf{W}_{\text{CLIP}} \in \mathbb{R}^{128 \times d_{\text{CLIP}}}

Regularization. During training, the iconic subspace is regularized by the iconic grounding loss (Section 5.2.3), which penalizes drift of iconic features across training steps. The regularization strength controls the trade-off between allowing the model to refine iconic representations based on training data and preserving the non-arbitrary structure that provides attractor stability. We set using a cosine schedule: high early in training (preserving CLIP initialization) and decaying to a floor value that maintains gentle stabilization:

\beta(t) = \beta_{\text{floor}} + \frac{1}{2}(\beta_0 - \beta_{\text{floor}})(1 + \cos(\pi t / T))

where , , and is the total number of training steps.

Worked example: How iconic grounding constrains drift. Consider the sign “sharp.” Its representamen embedding captures distributional co-occurrence with cutting, intelligence, music, pain, etc. Its interpretant embedding may diverge across communities: in a music community, “sharp” generates interpretants linked to pitch; in a culinary community, to knife quality. The iconic grounding subspace provides a cross-modal anchor: “sharp” has high values on the consonant-manner angularity dimension (the voiceless post-alveolar fricative /ʃ/ has abrupt spectral onset) and high spectral brightness, corresponding to angular, bright visual features. This iconic anchor is shared across communities and provides a common referent – a non-arbitrary core of meaning – around which the community-specific interpretants orbit. The iconic grounding loss penalizes training updates that move away from this anchor, ensuring that even as the interpretant embedding diverges across community contexts, the object embedding retains a stable core.

The community-context vector (with at 7B scale) specifies the interpretive community for computing the context-dependent interpretant embedding. Three mechanisms determine :

Explicit specification: At inference time, the user or system prompt can specify the community context (e.g., “interpret from a conservative perspective,” “interpret from a progressive perspective,” “interpret from a neutral/bridging perspective”). This maps to a learned community embedding.
Inferred from context: During training on annotated data, the community context is extracted from metadata (source publication, subreddit, political lean annotation). During inference on unspecified contexts, a community inference module – a small classifier operating on the first tokens of the input – predicts a soft distribution over community prototypes, and is set to the weighted average.

Multi-community mode: For the bridging/reflexive inference mode (Section 5.5), is not a single vector but a set of community vectors. The model computes interpretant embeddings for each, enabling comparison of how the same sign produces different interpretants across communities. This is the computational instantiation of third-order metapragmatic awareness: seeing the same sign from multiple interpretive positions simultaneously.

Standard multi-head attention (Vaswani et al., 2017) computes relevance between positions based on content and position. It captures which tokens attend to which other tokens but does not explicitly represent the interpretive trajectory across positions – the evolving chain of sign-interpretant-sign transitions that constitutes semiotic processing in the Peircean framework.

Consider a sequence containing “freedom is essential for democracy.” Standard attention computes relationships among these tokens based on their distributional embeddings, capturing syntactic dependencies and topical coherence. But it does not represent the interpretant chain: how the interpretant of “freedom” feeds into the interpretation of “essential,” which shapes the interpretation of “democracy,” which may retroactively modify the interpretation of “freedom” (Peirce’s unlimited semiosis). The MAH is designed to track this chain structure.

More precisely, the MAH operationalizes the observation from Section 2.1.2 that interpretant chains exhibit compounding divergence: small differences in initial interpretants amplify through successive chain links. The MAH detects this amplification by monitoring how interpretant embeddings shift across positions, producing a signal that indexes the degree and character of chain divergence.

MAH heads are inserted at designated layers – we use layers , , and (for a 32-layer model: layers 11, 22, and 32) – enabling the model to track interpretant dynamics at early, middle, and late processing stages.

At each designated layer , metapragmatic heads operate on the interpretant component of the composite embedding. Each MAH head computes:

Interpretant-specific attention:

\alpha_{ij}^{h,\ell} = \text{softmax}\left(\frac{(\mathbf{W}_Q^{h,\ell} \mathbf{e}_i^{I,\ell})(\mathbf{W}_K^{h,\ell} \mathbf{e}_j^{I,\ell})^\top}{\sqrt{d_k}}\right)

where is the interpretant component at layer (extracted from the composite hidden state via a learned projection) and .

Divergence signal computation. Each head computes a pairwise divergence signal measuring the interpretant shift between positions:

\delta_{ij}^{h,\ell} = \|\mathbf{W}_\delta^{h,\ell}(\mathbf{e}_i^{I,\ell} - \mathbf{e}_j^{I,\ell})\|_2

This L2 norm of the projected difference captures the magnitude of the interpretant shift. The projection matrix (with ) allows each head to specialize in detecting different types of interpretant shift – one head might become sensitive to indexical shifts (changes in social-identity association), another to referential shifts (changes in object reference), another to affective shifts (changes in emotional valence).

\mathbf{W}_\delta^{h,\ell} \in \mathbb{R}^{d_\delta \times d_I}

Aggregation into chain divergence vector. The per-head divergence signals are aggregated into a per-position chain divergence vector that summarizes the character of interpretant dynamics at position :

\mathbf{d}_i^{\ell} = \text{MLP}_\delta\left(\left[\underbrace{\frac{1}{H_{\text{meta}}}\sum_h \sum_j \alpha_{ij}^{h,\ell} \delta_{ij}^{h,\ell}}_{\text{attention-weighted mean}} \ \Big\| \ \underbrace{\max_j \max_h \delta_{ij}^{h,\ell}}_{\text{peak divergence}} \ \Big\| \ \underbrace{\text{Var}_{j,h}(\delta_{ij}^{h,\ell})}_{\text{divergence variance}} \ \Big\| \ \underbrace{\nabla_j \delta_{ij}^{h^*,\ell}}_{\text{divergence gradient}}\right]\right)

where is the head with highest average divergence and is the finite-difference approximation of the divergence gradient across positions (capturing whether divergence is increasing or decreasing along the sequence – whether interpretant chains are converging or compounding).

h^* = \arg\max_h \sum_j \alpha_{ij}^{h,\ell} \delta_{ij}^{h,\ell}

The four aggregated features encode: - Mean divergence: How much interpretants are drifting on average – the base level of semiotic dynamism. - Peak divergence: The most extreme chain shift in the current context – identifying potential bifurcation sites. - Divergence variance: Whether drift is uniform (diffuse semiotic instability) or concentrated (a specific sign is the locus of divergence). - Divergence gradient: Whether the situation is getting better (converging) or worse (compounding) – the temporal derivative of semiotic dynamics.

The MLP is a two-layer network (input: 4 scalars × , hidden: 256, output: ) producing the chain divergence vector .

\mathbf{d}_i^{\ell} \in \mathbb{R}^{d_{\text{div}}}

In multi-community mode (Section 4.2.4), the MAH can operate across community contexts simultaneously. Given two community vectors and , the model computes two sets of interpretant embeddings and for the same input. The MAH then computes a cross-community divergence signal:

\delta_i^{\text{cross}} = \|\mathbf{W}_{\text{cross}}(\mathbf{e}_i^{I,A} - \mathbf{e}_i^{I,B})\|_2

This signal identifies tokens where the two communities produce maximally different interpretants – the bifurcation sites in the semiotic landscape. The cross-community divergence feeds into the BEN’s estimation of and can be surfaced to the user as an interpretive map showing where and how meaning fractures.

Metapragmatic awareness is not a single-step computation but an evolving meta-observation: the capacity to monitor one’s own interpretive trajectory over time, noting when interpretants are drifting, when basins are shifting, and when one’s own framing may be contributing to divergence. This requires a recurrent architecture that maintains state across processing steps.

The RRM is the architectural analog of Silverstein’s third-order indexical awareness: a module that observes not the content of the model’s representations but the dynamics of those representations over the course of processing. Just as human metapragmatic awareness involves a running feel for how discourse is unfolding – an experienced reader senses when an argument is becoming charged, when terms are being used tendentiously, when the ground of shared meaning is shifting – the RRM maintains a differentiable state that tracks the trajectory of interpretant dynamics across the processing of a sequence.

The choice of a GRU rather than an LSTM or a transformer for this module is deliberate. The GRU’s simpler gating structure (two gates rather than three) reduces parameter count, but more importantly, the GRU’s architecture creates an inductive bias toward smooth state updates (the convex combination between old and new states), which is appropriate for modeling the gradual evolution of metapragmatic awareness rather than the sharp switching behavior that LSTMs can implement.

The RRM is a Gated Recurrent Unit (Cho et al., 2014) that takes as input the chain divergence vectors produced by the MAH at each layer and maintains a hidden state (with at 7B scale) representing the model’s cumulative meta-observation:

\mathbf{h}_t^{\text{meta}} \in \mathbb{R}^{d_{\text{meta}}}

The GRU equations are:

\mathbf{z}_t = \sigma(\mathbf{W}_z [\mathbf{d}_t \| \mathbf{h}_{t-1}^{\text{meta}}] + \mathbf{b}_z) \quad \text{(update gate)}

\mathbf{r}_t = \sigma(\mathbf{W}_r [\mathbf{d}_t \| \mathbf{h}_{t-1}^{\text{meta}}] + \mathbf{b}_r) \quad \text{(reset gate)}

\tilde{\mathbf{h}}_t = \tanh(\mathbf{W}_h [\mathbf{d}_t \| \mathbf{r}_t \odot \mathbf{h}_{t-1}^{\text{meta}}] + \mathbf{b}_h) \quad \text{(candidate state)}

\mathbf{h}_t^{\text{meta}} = (1 - \mathbf{z}_t) \odot \mathbf{h}_{t-1}^{\text{meta}} + \mathbf{z}_t \odot \tilde{\mathbf{h}}_t \quad \text{(new state)}

where is the chain divergence vector at position (averaged across MAH layers if multiple MAH layers have fired by this point), is the sigmoid function, and is elementwise multiplication.

\mathbf{d}_t \in \mathbb{R}^{d_{\text{div}}}

The input to the GRU at position is not just the current divergence vector but a concatenation of divergence vectors from all MAH layers that have contributed up to position ’s processing depth:

\mathbf{d}_t = \text{MLP}_{\text{agg}}([\mathbf{d}_t^{\ell_1} \| \mathbf{d}_t^{\ell_2} \| \ldots])

where are the MAH layers, and MLP (linear, ) aggregates across layers.

3 \times d_{\text{div}} \to d_{\text{div}}

Semiotic interpretation of the gates. The GRU gates have natural semiotic interpretations:

Update gate : Controls how much the meta-observation state changes in response to new divergence information. A high update gate means the current position’s divergence is noteworthy – it shifts the model’s meta-assessment of the semiotic situation. A low update gate means the current position’s divergence is within the range the model has already observed – it does not change the meta-landscape.

Reset gate : Controls how much of the previous meta-observation remains relevant. A low reset gate (close to 0) means the model is resetting its metapragmatic context – the semiotic situation has changed so dramatically that prior observations are no longer informative. This corresponds to a topic shift, a genre boundary, or a move from one interpretive community to another within the same text.

The meta-observation state is projected and added as a residual to the transformer’s hidden states at designated layers, providing the model’s generative process with information about its own interpretive dynamics:

\mathbf{x}_t^{(\ell+1)} = \mathbf{x}_t^{(\ell)} + \gamma^{(\ell)} \cdot \mathbf{W}_{\text{proj}}^{(\ell)} \mathbf{h}_t^{\text{meta}}

for designated layers (for a 32-layer model: layers 8, 16, 24).

\ell \in \{\lfloor L/4 \rfloor, \lfloor L/2 \rfloor, \lfloor 3L/4 \rfloor\}

The scalar is a learned gating parameter (initialized to 0) that controls the strength of meta-observation injection at each layer. Initialization at 0 ensures that at the start of training, the RRM has no effect on the transformer’s processing, allowing the semiotic components to be added to a pre-trained transformer without initial disruption. As training proceeds, increases from 0 as the RRM learns to produce useful meta-observations.

The injection points are deliberately spread across the transformer’s depth: - Layer (early injection): Influences the model’s initial contextual representations, enabling early semiotic conditioning. - Layer (mid injection): Modulates the model at the level where, in mechanistic interpretability studies, abstract semantic representations are most active. - Layer (late injection): Directly influences the representations from which generation decisions are made, enabling semiotic modulation of output.

The pitchfork bifurcation model (Section 2.2) identifies the amplification parameter as the critical quantity governing whether interpretation remains consensual or fractures into antagonistic basins. The stochastic extension (Section 2.2.2) identifies critical slowing down as a detectable precursor of bifurcation. For the model to function as a steward of semiotic ecology – not merely describing the semiotic landscape but actively navigating it – it must estimate from context and modulate its generative dynamics accordingly.

The BEN operationalizes this estimation as a differentiable computation. It receives information about the current state of the semiotic landscape (from the RRM’s meta-observation state) and the current token’s position in the attractor landscape (from its attractor embedding), and produces both a diagnostic output (: where are we relative to bifurcation?) and a prescriptive output (: how should generation be adjusted?).

The BEN is a three-layer MLP with residual connections:

\mathbf{v}_1 = \text{ReLU}(\mathbf{W}_1^{\text{BEN}} [\mathbf{h}_t^{\text{meta}} \| \mathbf{e}_t^A] + \mathbf{b}_1^{\text{BEN}})

\mathbf{v}_2 = \text{ReLU}(\mathbf{W}_2^{\text{BEN}} \mathbf{v}_1 + \mathbf{b}_2^{\text{BEN}}) + \mathbf{W}_{\text{skip}} [\mathbf{h}_t^{\text{meta}} \| \mathbf{e}_t^A]

[\hat{r}_t, \mathbf{m}_t] = \mathbf{W}_3^{\text{BEN}} \mathbf{v}_2 + \mathbf{b}_3^{\text{BEN}}

Layer dimensions: , where is the vocabulary size. The skip connection from input to the second hidden layer provides the BEN with direct access to the raw meta-observation features, preventing information loss through the nonlinearities.

(d_{\text{meta}} + d_A) \to 512 \to 256 \to (1 + d_{\text{vocab}})

Output 1: Bifurcation parameter estimate . This scalar estimates the effective amplification parameter for the current context. It is supervised during training by ground-truth values derived from the semiotic annotation framework (Section 5.1): - Texts from contexts with low measured polarization receive labels. - Texts from contexts near measured polarization transitions receive labels. - Texts from highly polarized contexts receive labels.

The estimation is trained with MSE loss: .

Output 2: Modulation vector . This vector adjusts the generation logits to bias output toward or away from different tokens based on the estimated semiotic state. It is applied before the softmax:

\mathbf{m}_t \in \mathbb{R}^{d_{\text{vocab}}}

p(t_{t+1} | t_{\leq t}) = \text{softmax}(\mathbf{W}_{\text{vocab}} \mathbf{x}_t^{(L)} + \lambda \mathbf{m}_t)

where is a controllable inference parameter: - : Standard generation (no semiotic modulation). The SRT behaves as a standard transformer. - : Graded semiotic modulation. The model gently biases generation based on the semiotic state. - : Full semiotic modulation. The model maximally adjusts its output based on bifurcation dynamics.

The modulation vector is trained end-to-end with the bridging coherence objective (Section 5.3): it learns to adjust logits in ways that improve the model’s ability to generate text that surfaces alternative interpretations, makes indexing explicit, and bridges interpretive communities.

During inference, the estimated triggers qualitatively different generative strategies, implementing the semiotic stewardship function:

Regime 1: Subcritical (, with , ). The semiotic environment is stable. The model generates normally with minimal modulation ( is effectively reduced to ). Interpretive divergence is not a concern in this context.

Regime 2: Near-critical (). The model detects proximity to the bifurcation threshold. This is the critical zone where small perturbations can push the system in either direction. The model increases modulation ( set to its configured value), biasing generation toward outputs that: - Make implicit indexical associations explicit (“this term is used differently in different communities…”) - Surface alternative interpretations without privileging one - Introduce metapragmatic markers that activate reflexive awareness in readers

|\hat{r}_t - r_{\text{crit}}| \leq \epsilon

Regime 3: Supercritical (). The system is already bifurcated. The model generates reflexive commentary – text that makes the bifurcation structure visible: - Identifies the competing attractor basins (“Community A interprets this as… while Community B interprets this as…”) - Articulates the interpretant chains characteristic of each basin - Notes the iconic anchors (if any) that could provide common ground - Avoids amplifying either basin’s interpretant chain at the expense of the other

The regime boundaries can be adjusted at inference time, allowing users to control the model’s sensitivity to semiotic dynamics.

The BEN also monitors for critical slowing down – the dynamical precursor of bifurcation identified in Section 2.2.2. During sequential processing, the BEN tracks:

\text{CSD}_t = \frac{\text{Var}(\hat{r}_{t-w:t})}{\text{Var}(\hat{r}_{t-2w:t-w})}

where is a window size (default: 32 tokens). A ratio significantly greater than 1 indicates increasing variance in the bifurcation estimate – a signature of critical slowing down. When CSD exceeds a threshold (default: 1.5), the model flags the current context as approaching bifurcation even if the current estimate remains subcritical. This provides early warning capability: the model can detect that meaning is about to fracture before the fracture is complete.

The complete SRT processes input as follows:

Tokenization and embedding: Input tokens pass through the SEL, producing composite embeddings with iconic grounding within and community-conditioned interpretant generation for .

Transformer processing with MAH: Composite embeddings pass through transformer layers with standard self-attention and feedforward sublayers. At layers , , and , MAH heads compute attention over interpretant components and produce chain divergence vectors .

Reflexive observation: The RRM processes aggregated divergence vectors to maintain the meta-observation state , which is injected as gated residuals at layers , , .

Bifurcation estimation: The BEN takes and as input, estimates , monitors critical slowing down, and produces modulation vector .

Modulated generation: Final logits combine standard prediction with semiotic modulation via , with the regime of determining the qualitative character of the modulation.

Parameter efficiency. The semiotic extensions add approximately 8.3% parameter overhead at the 7B scale. This is comparable to the overhead of adapter-based fine-tuning methods (LoRA adds ~1-5%; full adapter tuning adds ~10-20%) and substantially less than training a separate model for semiotic tasks. The overhead is dominated by the SEL’s expanded embedding table; the MAH, RRM, and BEN together contribute less than 3%.

Compatibility with existing infrastructure. The SRT is designed for implementation in standard transformer frameworks (PyTorch, JAX). The SEL replaces the standard embedding layer; the MAH adds attention heads at designated layers (compatible with standard attention implementations); the RRM is a standard GRU; the BEN is a standard MLP. No custom CUDA kernels or non-standard operations are required. KV-cache optimization extends naturally to the MAH by caching interpretant key-value pairs alongside standard key-value pairs.

Initialization from pre-trained models. The SRT can be initialized from any pre-trained transformer checkpoint. The representamen embedding is initialized directly from the pre-trained embedding table. The object embedding is initialized from CLIP (iconic subspace) and random (non-iconic subspace). The interpretant embedding is initialized randomly. The attractor embedding is initialized to zero. All semiotic-specific parameters (, , MAH projections, RRM, BEN) are randomly initialized with appropriate scaling. The gating parameters are initialized to 0, ensuring no disruption of the pre-trained model at the start of training.

This section specifies the complete training pipeline for the SRT, from data collection and annotation through pre-training, fine-tuning, and inference-time configuration. The pipeline is designed for reproducibility and is compatible with standard distributed training infrastructure (DeepSpeed, FSDP). Table 3 provides a phase-by-phase summary.

Table 3: Training Pipeline Overview

The training pipeline requires corpora annotated with semiotic metadata beyond standard text. We specify the Semiotic Annotation Schema (SAS), a structured JSONL format that enriches text with interpretant chain information. Each annotated document is stored as a JSON object with the following fields:

The schema fields correspond directly to theoretical constructs:

Sign-Object-Interpretant (SOI) triples: For contested or semiotically rich tokens, structured annotations specifying:
- The sign (token or phrase span), which serves as the representamen
- The object with explicit immediate/dynamic distinction (Section 2.1.1), capturing the gap between how the sign presents its referent and what the actual referent is. The

- field (range

- ) quantifies the sign’s iconic depth

- from Section 2.4.7
- One or more interpretants per community, each tagged with: community identifier, the Silversteinian indexical order (1, 2, or 3), the Agha-style characterological figure the sign indexes, and an affective valence score
Chain sequences: Ordered sequences of sign-interpretant links tracing how an interpretant from one triple functions as the sign for the next, implementing Kockelman’s (2025) formalization. Each link carries a score measuring how far the chain has drifted from a hypothetical consensus interpretant. This score operationalizes the compounding divergence described in Section 2.1.2. Chains may extend across documents from different communities (cross-document chains) to model how texts from one community generate interpretants that function as signs within another.

Metapragmatic metadata: Document-level scores:
- Indexing gap (

- ): Degree of divergence in indexical associations across communities. Computed as the cosine distance between mean interpretant vectors for the two most divergent communities in the SOI triples
- Enregisterment divergence (

- ): Quantified difference in characterological associations, computed as the Jensen-Shannon divergence between the distributions of characterological figures activated by key signs across communities
- Estimated

- -value (

- ): Position on the bifurcation parameter scale. Negative values indicate pre-bifurcation consensus; values near zero indicate proximity to the critical threshold; positive values indicate post-bifurcation divergence
- Bifurcation phase: Categorical label:

- , or

- (for texts from contexts where amplification has decreased but the system remains bifurcated)
- Attractor basin: Basin membership label when in a bifurcated state
Attractor labels: Token-level classification of each sign’s position within the semiotic attractor landscape (basin A, basin B, boundary region, uncontested).

Annotation proceeds through a four-phase hybrid human-AI pipeline designed to scale semiotic annotation to the volumes required for LLM training while maintaining the interpretive depth that distinguishes semiotic annotation from standard NLP labeling.

Phase 1: Computational pre-annotation (automated). An existing LLM (e.g., GPT-4 or equivalent) identifies candidate contested signs by measuring embedding variance across community-stratified sub-corpora. The procedure:

Partition the training corpus into community-stratified shards using metadata (publication source, subreddit, platform section, author self-identification where available). Validate stratification through k-means clustering on document embeddings, confirming that shards correspond to interpretively distinct communities rather than merely topical clusters.
For each unique token/bigram/trigram, compute the embedding centroid within each community shard using a pre-trained language model (e.g., the base transformer before semiotic augmentation).
Compute the coefficient of variation (CV) of these centroids across community shards:Signs with (threshold , calibrated on a pilot set of known contested signs) are flagged for human annotation.

\text{CV}(w) = \frac{\text{std}(\{\bar{\mathbf{e}}_w^{c_1}, \bar{\mathbf{e}}_w^{c_2}, \ldots\})}{\|\text{mean}(\{\bar{\mathbf{e}}_w^{c_1}, \bar{\mathbf{e}}_w^{c_2}, \ldots\})\|}

For flagged signs, generate candidate SOI triples using the LLM with structured prompting: “For the sign as used in , identify the most likely interpretant in community , the characterological figure it indexes, and the indexical order of the association.”

Generate candidate chain sequences by prompting the LLM to extend each SOI triple: “If a member of interprets as , what sign does that interpretant become, and what further interpretant does it produce?”

Phase 2: Expert annotation (human). Trained annotators with backgrounds in linguistics, semiotics, political communication, or cultural anthropology review and correct the computational pre-annotations. Key requirements:

Cross-community pairing: Each document is annotated by annotators from at least two different interpretive communities (e.g., one annotator familiar with progressive discourse, one with conservative discourse). This ensures that interpretant annotations for each community are produced or validated by annotators who inhabit (or have deep familiarity with) that community’s interpretive frameworks.
Calibration: Annotators undergo a calibration phase using a reference set of 500 pre-annotated documents. Inter-annotator agreement targets are established through iterative discussion and adjudication.
Metapragmatic self-reporting: Annotators are asked to note instances where their own semiotic ideologies influence their annotations. This practice constitutes a form of annotator-level metapragmatic awareness that both improves annotation quality and generates training signal for the model’s reflexive capabilities.
Scale: We estimate that Phase 2 requires approximately 2,000-5,000 annotator hours for the target corpus, comparable to the annotation effort for large-scale NLI datasets (Bowman et al., 2015; Williams et al., 2018).

Phase 3: Synthetic augmentation (automated). Bifurcation simulations generate synthetic chain sequences by varying the -parameter in a computational implementation of the pitchfork model. The procedure:

Implement the stochastic pitchfork model (Section 2.2.2): , with sampled uniformly from , from , and initial conditions from a standard normal distribution.

For each run, record the trajectory and map it to a synthetic semiotic scenario: → basin A text; → boundary/consensus text; → basin B text; transitions between regions → chain divergence events.

Use the LLM to generate naturalistic text instantiating each trajectory segment, conditioned on topic, community, and bifurcation phase. For example, a trajectory that transitions from to (crossing the bifurcation threshold) generates a sequence of texts showing a contested sign shifting from consensus to divergent interpretation.

Annotate the synthetic texts with ground-truth -values, chain divergence scores, and attractor labels derived from the simulation parameters (not from the text itself), providing training signal with known dynamical ground truth.

This synthesis produces controlled training examples across the full range of bifurcation dynamics, addressing the data imbalance problem (most natural text is either clearly pre-bifurcation or clearly post-bifurcation; near-threshold dynamics are underrepresented in organic data).

Phase 4: Validation. A held-out annotator panel (distinct from Phase 2 annotators) evaluates quality:

Documents failing validation are returned to Phase 2 for re-annotation.

The training corpus targets the following composition, designed to balance distributional coverage with semiotic depth:

Critical design decision: Why not more standard web text? Standard pre-training corpora (RedPajama, The Pile, Dolma) consist of 80-95% web text. We reduce this to 45% to make room for the community-stratified and metapragmatic data that the semiotic components require. The distributional foundation provided by 4.5B tokens of web text is sufficient for core language competence at the 7B scale (Hoffmann et al., 2022 “Chinchilla” scaling suggests ~140B tokens for optimal compute-matched training at 7B; our 10B total trades some raw scaling for semiotic depth, appropriate for a proof-of-concept). For production-scale training, we recommend increasing total corpus size to 100B+ tokens while maintaining the proportions above.

Pre-training optimizes a multi-objective loss that trains the distributional and semiotic components jointly. The total loss is:

where the hyperparameters control the relative weight of each auxiliary objective. The iconic grounding loss uses the time-dependent schedule from Section 4.2.3. We now specify each loss in detail.

Standard autoregressive cross-entropy loss over the vocabulary:

\mathcal{L}_{\text{LM}} = -\frac{1}{T}\sum_{t=1}^{T} \log p(t_{t+1} | t_{\leq t})

where is the sequence length and is the softmax distribution over the vocabulary produced by the generation head (with semiotic modulation at strength , set to during pre-training to softly incorporate semiotic modulation from the start without overwhelming the distributional signal).

This loss ensures the model retains standard language generation capabilities. It operates on all tokens in the corpus, including unannotated ones, and provides the primary gradient signal for the representamen embedding and the backbone transformer parameters.

Masking for annotated regions. For tokens that carry semiotic annotations, the LM loss is still computed (the model should predict them as fluent text) but we additionally compute the semiotic losses below. The gradients from both loss sources flow through the shared backbone, creating a training signal that is simultaneously distributional and semiotic.

For annotated positions with SOI triples and chain sequences, the model is trained to reconstruct interpretant chains by predicting the next interpretant given the current sign and community context:

\mathcal{L}_{\text{chain}} = -\frac{1}{|\mathcal{C}|}\sum_{(s,c,i) \in \mathcal{C}} \log p_{\text{chain}}(i | s, c; \theta)

where is the set of annotated chain links, is the sign embedding (the composite embedding at the sign’s position), is the community context vector, and is the target interpretant.

The probability is computed by a dedicated chain prediction head. This head is a two-layer MLP that takes the concatenation of (the interpretant embedding, which is conditioned on community context via ) and (the RRM’s meta-observation state) and predicts the next interpretant as a distribution over the interpretant vocabulary:

p_{\text{chain}}(i | s, c; \theta) = \text{softmax}(\mathbf{W}_{\text{chain}} \cdot \text{ReLU}(\mathbf{W}_{\text{chain1}} [\mathbf{e}_i^I \| \mathbf{h}_t^{\text{meta}}]))

The interpretant vocabulary is constructed during annotation as the set of unique interpretant descriptions in the training data, clustered into prototype interpretants (we use ) via embedding similarity. This allows the model to predict structured interpretant categories rather than generate free-form interpretant descriptions during pre-training (free-form generation is reserved for fine-tuning).

This loss provides the primary training signal for the interpretant embedding component , the community-dependent modulation function , the MAH divergence signals, and the RRM. It directly operationalizes the Kockelman/Lancaster interpretant chain formalism as a learning objective.

Multi-link chain loss. For chain sequences with multiple links, we additionally compute a multi-step prediction loss that measures the model’s ability to trace chains beyond one step:

\mathcal{L}_{\text{chain}}^{\text{multi}} = -\frac{1}{|\mathcal{C}_{\text{multi}}|}\sum_{(s, c, i_1, i_2, \ldots, i_k) \in \mathcal{C}_{\text{multi}}} \sum_{j=1}^{k} \lambda_{\text{decay}}^{j-1} \log p_{\text{chain}}(i_j | s, c, i_1, \ldots, i_{j-1}; \theta)

where geometrically down-weights later links (acknowledging increasing prediction uncertainty deeper in the chain). The total chain loss is .

\mathcal{L}_{\text{chain}} = \mathcal{L}_{\text{chain}}^{\text{single}} + 0.5 \cdot \mathcal{L}_{\text{chain}}^{\text{multi}}

The iconic grounding loss serves two functions: (a) training the iconic subspace to encode cross-modal correspondences, and (b) regularizing it against drift during the course of training. The loss has two terms:

\mathcal{L}_{\text{icon}} = \mathcal{L}_{\text{icon}}^{\text{align}} + \mu \mathcal{L}_{\text{icon}}^{\text{stab}}

Alignment term. For tokens with cross-modal grounding targets (from the 10% cross-modal grounding data), the alignment loss encourages the iconic subspace to match target grounding vectors:

\mathcal{L}_{\text{icon}}^{\text{align}} = \frac{1}{|\mathcal{G}|}\sum_{i \in \mathcal{G}} \|\mathbf{e}_i^{O,\text{icon}} - \mathbf{g}_i^*\|_2^2

where is the set of tokens with grounding annotations, and is the target iconic grounding vector. For phonosemantic features, is derived from the phonological encoder’s extraction of articulatory/acoustic properties. For CLIP-derived features, is the projected CLIP embedding of the token’s corresponding visual concept.

Stability term. For all tokens, the stability loss penalizes deviation of the iconic subspace from its initial values, implementing the attractor stabilization function formalized in Section 2.4.7:

\mathcal{L}_{\text{icon}}^{\text{stab}} = \frac{1}{|\mathcal{V}|}\sum_{w \in \mathcal{V}} \|\mathbf{e}_w^{O,\text{icon}}(t) - \mathbf{e}_w^{O,\text{icon}}(0)\|_2^2

where is the iconic embedding at training step and is its initialization (from CLIP and the phonological encoder). This is a form of elastic weight consolidation (Kirkpatrick et al., 2017) applied specifically to the iconic subspace, ensuring that training does not erode the non-arbitrary grounding structure that provides attractor stability.

The time-dependent weighting (Section 4.2.3) modulates the total iconic loss across training, and controls the relative weight of stability vs. alignment.

For annotated positions with attractor labels and -value estimates, the attractor loss trains the attractor embedding and the BEN:

\mathcal{L}_{\text{attractor}} = \mathcal{L}_{\text{basin}} + \mathcal{L}_r + \nu \mathcal{L}_{\text{topo}}

Basin classification loss. Softmax cross-entropy over basin labels:

\mathcal{L}_{\text{basin}} = -\frac{1}{|\mathcal{B}|}\sum_{t \in \mathcal{B}} \sum_{k=1}^{K} b_{t,k}^* \log \hat{b}_{t,k}

where is the predicted probability of basin , and is the ground-truth basin indicator. We use basins (basin A, basin B, boundary, pre-bifurcation consensus, uncontested), though can be increased for finer-grained analysis of multi-polar (not just bipolar) bifurcation.

\hat{b}_{t,k} = \text{softmax}(\mathbf{W}_{\text{basin}} \mathbf{e}_t^A)_k

-estimation loss. MSE loss for the BEN’s bifurcation parameter estimate:

\mathcal{L}_r = \frac{1}{|\mathcal{R}|}\sum_{t \in \mathcal{R}} (\hat{r}_t - r_t^*)^2

where is the BEN’s output and is the annotated ground-truth -value. For synthetic data (Phase 3 of annotation), is exact (derived from the simulation parameter). For human-annotated data, is estimated with uncertainty, and we weight the loss inversely by annotation uncertainty: high-confidence estimates receive more weight.

Topological consistency loss. A novel loss that encourages the attractor embedding space to preserve the topological structure of the bifurcation landscape:

\mathcal{L}_{\text{topo}} = \frac{1}{|\mathcal{P}|}\sum_{(i,j) \in \mathcal{P}} \max(0, \|\mathbf{e}_i^A - \mathbf{e}_j^A\| - d_{ij}^*)^2 + \max(0, d_{ij}^* - \|\mathbf{e}_i^A - \mathbf{e}_j^A\|)^2

where is a set of annotated token pairs, and is the target distance in the attractor landscape: tokens in the same basin should be close, tokens in different basins should be far, and tokens near the bifurcation boundary should be at intermediate distances. This loss is inspired by contrastive learning but operates on the specific geometry of bifurcation dynamics rather than general semantic similarity.

We set (the topological loss is secondary to the classification and estimation losses, serving as a geometric regularizer).

The full pre-training loss is:

with recommended initial hyperparameters , , , tuned on validation performance for each objective independently.

Curriculum schedule. We recommend a three-phase curriculum:

Phase A (0-20% of training): Emphasize and . Set , . This phase establishes the distributional foundation and the iconic grounding structure before asking the model to learn semiotic dynamics. The gating parameters for RRM injection are still near their initial value of 0, so the semiotic components have minimal effect on backbone representations.

Phase B (20-70% of training): Ramp up semiotic objectives. Linearly increase to 0.3 and to 0.2 over this phase. The RRM and BEN begin to influence backbone processing as increases from 0. The model learns interpretant chain dynamics and attractor structure with the distributional foundation already in place.

Phase C (70-100% of training): Full semiotic training. All hyperparameters at their target values. The iconic grounding loss has decayed to (gentle stabilization). The model refines its semiotic representations with the full architecture active.

This curriculum prevents the semiotic losses from interfering with early distributional learning while ensuring that semiotic capabilities are fully developed by the end of pre-training.

After pre-training, the model is fine-tuned on four tasks designed to develop specific semiotic-reflexive capabilities. Fine-tuning uses proportional sampling across tasks (25% each by default, adjustable via task-specific weighting) and operates in a multi-task setting where each batch contains examples from all four tasks.

Input: A sign , its textual context (surrounding sentence or paragraph), and a community context specification .

Output: The predicted chain of interpretants, which is the sequence of interpretive effects the sign would produce in the specified community, generated as natural language.

Format: Structured generation with chain markers:

Loss: Sequence-to-sequence cross-entropy on the generated chain, plus a chain coherence auxiliary loss that penalizes incoherent transitions (where the interpretant at step is semantically unrelated to the sign at step ).

Metrics: Chain accuracy (proportion of correctly predicted interpretant links, measured against human-annotated reference chains); chain coherence (rated by human evaluators on a 5-point Likert scale); chain depth (average length of correctly predicted chains before the first error).

This task develops the model’s capacity to trace interpretant chains across communities, the foundation of semiotic-reflexive generation. It directly trains the interpretant embedding, the community context mechanism, and the chain prediction head.

Input: A contested sign and two community contexts , .

Output: A text that renders the sign intelligible across both communities. This text articulates how each community’s interpretant chain operates, identifies the points of divergence, and proposes bridging language that surfaces common ground or at minimum renders the disagreement navigable.

Format: Open-ended generation, typically 100-300 tokens, structured as:

Identification of the sign and its contested character
Articulation of Community A’s interpretant chain (core associations, indexical values, characterological figure)
Articulation of Community B’s interpretant chain
Identification of the divergence point (where the chains split)
Bridging language (what both chains share, where iconic or embodied grounding provides common referent, what each community would need to understand about the other’s chain to make the disagreement productive rather than existential)

Loss: A combination of: - Sequence-to-sequence cross-entropy against human-written bridge texts - A bridging coherence reward (during RLHF, if used): human evaluators rate whether the generated bridge text would be judged as fair and illuminating by members of both communities (not just one) - A divergence coverage penalty: automated check that the generated text mentions both communities’ interpretant chains, preventing the model from collapsing into a single-perspective summary

Metrics: Bridging coherence (Section 6.1.1). This is the central generative capability: producing text that functions as a semiotic bridge between attractor basins rather than reinforcing occupancy within one.

Input: A text exhibiting interpretive dynamics (e.g., a politically contested argument, a deepfake description, an ambiguous slogan, a media narrative, an AI-generated text with contested referents).

Output: A metapragmatic commentary that identifies: - The key signs and their divergent interpretants across communities - The indexical orders at play (first-order referential, second-order social-indexical, third-order metapragmatic) - The semiotic ideologies structuring interpretation (what each community assumes about how signs relate to reality) - The enregisterment patterns (what characterological figures are activated, how they have formed) - The current position on the bifurcation landscape (, basin proximity, critical slowing down indicators) - The available iconic anchors that might provide cross-community common ground

Format: Structured analytical essay, typically 200-500 tokens, organized under the headings above.

Loss: Sequence-to-sequence cross-entropy against expert-written reflexive analyses, with auxiliary losses for: - -estimation accuracy: the embedded in the commentary should match the annotated ground-truth - Completeness: automated check for the presence of each required component (signs, interpretants, indexical orders, semiotic ideologies, enregisterment, landscape position)

Metrics: Reflexivity fidelity (Section 6.1.2). This task develops the model’s third-order metapragmatic capacity, specifically its ability to produce the kind of reflexive discourse that Silverstein identifies as essential for navigating semiotic complexity and that Pennycook et al. (2021) found to be empirically effective at reducing misinformation sharing.

Input: A description of a semiotic context with specified or estimated -value, noise intensity , and optional imperfection parameter (for asymmetric bifurcation).

Output: A prediction of the system’s trajectory, including whether it will converge, bifurcate, exhibit hysteresis, or show critical slowing down. The output also includes the generation of example texts characteristic of each predicted regime.

Format: Structured prediction + conditional generation:

Loss: Combination of: - Classification loss on trajectory prediction (converge/bifurcate/hysteresis/CSD) - MSE loss on predicted against the specified - Cross-entropy on the generated regime-characteristic texts - Consistency loss ensuring that basin A and basin B texts are semiotically distinct in the ways the model predicts

Metrics: Bifurcation prediction accuracy (Section 6.1.3). This task develops the BEN’s capacity to estimate dynamical parameters and the model’s ability to generate text that illustrates the consequences of semiotic dynamics at different parameter values.

Optimizer: AdamW (Loshchilov & Hutter, 2019) with decoupled weight decay , , , .

Learning rate schedule: - Pre-training: Linear warmup for 2,000 steps to peak LR , then cosine decay to over the full training run. - Fine-tuning: Linear warmup for 500 steps to peak LR , then cosine decay to .

Parameter-efficient training strategy: - Backbone transformer parameters: Trained with LoRA (Hu et al., 2022) during fine-tuning (rank , , applied to , , , and output projection matrices). Full training during pre-training. - Semiotic-specific parameters (SEL, MAH, RRM, BEN): Full gradient updates during both pre-training and fine-tuning. These components represent the novel architectural contributions and must receive full training signal. - Iconic grounding subspace: Phonological encoder is frozen throughout; CLIP-initialized components are trainable but regularized by .

Gradient management: - Gradient clipping: max norm , applied globally across all parameter groups. - Gradient accumulation: effective batch size of 2M tokens (e.g., 256 sequences of 8,192 tokens, accumulated across 4-8 gradient steps depending on available GPU memory). - Separate gradient scaling for semiotic vs. backbone parameters is not used; the hyperparameters in the loss function provide sufficient control over the relative magnitude of semiotic gradients.

Training regime: - Pre-training: 1-2 passes over the full 10B-token corpus. At 2M tokens per effective batch and 10B tokens total, this is approximately 5,000-10,000 gradient steps. Estimated wall time: 10-14 days on 64 A100 80GB GPUs with DeepSpeed ZeRO Stage 3. - Fine-tuning: 5-10 epochs over the task-specific datasets (estimated 100M tokens total across all four tasks). Approximately 500-1,000 gradient steps. Estimated wall time: 2-3 days on 8 A100 80GB GPUs.

Compute budget comparison: For a 7B-parameter model, total pre-training compute is approximately FLOPs, representing a ~30% premium over standard pre-training of an equivalent model on 10B tokens (the premium comes from the semiotic components’ forward/backward passes, the expanded embedding dimension, and the multi-objective loss computation). Fine-tuning compute is negligible relative to pre-training ( FLOPs).

At inference, the parameter and the BEN’s adaptive behavior provide user-controllable and context-adaptive modulation. We define three standard modes and a custom mode:

Mode 1: Standard generation (, BEN disabled). The model generates text as a standard transformer, ignoring semiotic modulation entirely. The SEL still decomposes embeddings (this is structural), but the MAH divergence signals, RRM meta-observations, and BEN modulation have no effect on output logits. This mode serves as the baseline, is appropriate for contexts where semiotic reflexivity is unnecessary (e.g., code generation, factual Q&A, creative writing without ideological content), and enables controlled ablation studies.

Mode 2: Semiotic awareness (, BEN active with regime detection). The default mode for general-purpose deployment. The model: - Monitors interpretant dynamics via the MAH and RRM - Estimates via the BEN at each generation step - Applies moderate modulation: gently biasing away from extreme attractor positions, surfacing alternative interpretations when exceeds a configurable threshold (default: ), and flagging critical slowing down when detected (CSD) - Does not generate unsolicited reflexive commentary; it adjusts the distribution over next tokens to reduce polarization-amplifying content while maintaining fluency and relevance

This mode is appropriate for general-purpose conversational AI, content generation, and any context where semiotic sensitivity is desirable without being the primary function.

Mode 3: Full reflexive (, BEN active with all three regimes). The model actively generates semiotic content: - In subcritical contexts: Generates normally but with access to semiotic representations, enabling richer contextual understanding - In near-critical contexts: Proactively surfaces the sign’s contested character, identifies the communities whose interpretant chains diverge, and generates bridging language - In supercritical contexts: Generates full reflexive commentary, including identifying basins, tracing chains, noting iconic anchors, and articulating what each community’s semiotic ideology is contributing to the divergence

This mode is appropriate for media analysis, conflict mediation, educational contexts (teaching about polarization dynamics), platform content moderation support, and research applications.

Mode 4: Custom ( user-specified, per-component control). Advanced users can configure: - to any value in - The -alert threshold for regime transitions - The CSD threshold for early warning - Whether to enable/disable individual components (MAH only, RRM only, BEN only) for diagnostic purposes - The community context vector (or set of vectors for multi-community analysis)

API specification (for deployment):

Evaluating a semiotic-reflexive system requires metrics that go beyond standard NLP benchmarks (BLEU, perplexity, task accuracy). The SRT’s claims are not merely that it generates fluent text, but that it generates text exhibiting specific semiotic properties: cross-community intelligibility, metapragmatic depth, dynamical awareness, and iconic grounding stability. This section specifies a comprehensive evaluation framework organized around five evaluation axes, a benchmark suite, systematic ablation studies, a statistical methodology, and explicit falsification criteria.

We define five evaluation axes. The first three target the model’s semiotic-reflexive capabilities directly; the fourth evaluates the iconic grounding mechanism’s contribution; the fifth ensures that semiotic capabilities do not degrade standard language performance.

Definition: The degree to which model-generated text renders a contested sign intelligible across interpretive communities without collapsing the sign’s contested character into false consensus or false equivalence.

This axis is the central claim of the SRT, namely that architectural semiotic awareness enables qualitatively better cross-community communication than distributional pattern-matching alone. The metric must distinguish genuine bridging (which preserves the reality of disagreement while making each side’s interpretive logic visible to the other) from several failure modes: (a) false balance (“both sides have a point”), (b) neutralization (removing the sign’s contested character), (c) one-sided advocacy disguised as bridging, and (d) superficial paraphrase without semiotic depth.

Metric: Human evaluation by cross-community panels. For each evaluated bridge text, evaluators from two opposing interpretive communities rate the model’s output on six dimensions:

Composite BC score:

\text{BC} = \frac{2 \cdot \text{CA} + \text{CD}_{\text{norm}} + 1.5 \cdot \text{FA} + 1.5 \cdot \text{IL} + \text{DA} + \text{AC}}{8}

where normalizes chain depth to the Likert scale (3 links = full score).

\text{CD}_{\text{norm}} = \min(\text{CD} / 3, 5)

Evaluator protocol: Minimum 5 evaluators per community, recruited to ensure familiarity with the community’s interpretive frameworks. Evaluators are blinded to the source of the bridge text (SRT vs. baseline vs. human-authored). Each evaluator rates a minimum of 20 bridge texts, with 5 overlap texts for inter-rater reliability estimation. Target: Krippendorff’s for each dimension before aggregation.

Baselines: - B1: Standard LLM (GPT-4-class) with no semiotic prompting (“Explain how different groups interpret the word ‘[sign]’”) - B2: Standard LLM with semiotic prompting (explicit instructions to identify interpretant chains, indexical orders, divergence points, thereby testing whether prompting alone achieves what architecture provides) - B3: Human-authored bridge texts drawn from mediation, conflict resolution, and deliberative democracy literature - B4: SRT in Mode 1 (), which is the same model without semiotic modulation, isolating the contribution of the semiotic components

Targets: BC B1 + 1.0 (architecture substantially outperforms naive prompting); BC B2 + 0.5 (architecture outperforms explicit semiotic prompting); BC approaching B3 0.3 (approaching human expert quality).

Definition: The accuracy and depth of the model’s metapragmatic commentary, specifically its capacity to identify signs, trace interpretant chains, classify indexical orders, surface semiotic ideologies, and assess bifurcation dynamics in a given text.

This axis evaluates the model’s third-order metapragmatic capacity (Silverstein’s third indexical order): not merely using signs, but commenting on how signs function across interpretive communities. The standard for comparison is expert semiotic analysis of the kind produced by trained linguistic anthropologists.

Metric: A structured evaluation with both automated and expert components:

Automated sub-metrics (computed against expert-annotated ground truth):

Expert sub-metrics (rated by trained semioticians, minimum 3 per text):

Composite RF score: Harmonic mean of the automated F1 (computed from precision and recall across all automated sub-metrics) and the mean expert rating normalized to :

\text{RF} = \frac{2 \cdot F1_{\text{auto}} \cdot \bar{E}_{\text{norm}}}{F1_{\text{auto}} + \bar{E}_{\text{norm}}}

The harmonic mean ensures that neither component alone can inflate the composite score. The model must be both technically accurate (automated) and analytically substantive (expert).

Baselines: B1-B4 as in Axis 1, plus: - B5: Expert human reflexive analysis (the ceiling against which all systems are compared) - B6: Standard LLM fine-tuned on semiotic theory texts but without SRT architecture (testing whether semiotic knowledge in training data alone suffices)

Target: RF (substantially above B1, B2; approaching within 15% of B5).

Definition: The precision with which the model estimates the amplification parameter , predicts regime transitions (convergence, bifurcation, hysteresis, critical slowing down), and identifies the temporal dynamics of semiotic polarization.

This axis tests whether the pitchfork bifurcation model, the mathematical backbone of the SRT’s dynamical theory, captures empirically real dynamics or is merely a suggestive metaphor that does not improve prediction beyond simpler alternatives.

Metric: Quantitative evaluation on three data types with increasing ecological validity:

Tier 1: Synthetic bifurcation data (controlled, exact ground truth).

Tier 2: Semi-synthetic data (naturalistic text generated from controlled simulations, as in Phase 3 of the annotation procedure). Performance targets are 5-10 percentage points below Tier 1, accounting for the noise introduced by text generation.

Tier 3: Historical case studies (real-world documented polarization events with retrospective -estimation by expert panels).

For Tier 3, ground truth is established by consensus of an expert panel (minimum 5 semioticians/political communication scholars), who independently estimate -trajectories and regime labels for each case. The SRT is compared against: - Expert panel consensus (ceiling) - Sentiment analysis trajectory (does simple positive/negative sentiment divergence predict as well as -estimation?) - Topic modeling divergence (LDA or BERTopic applied to community-stratified corpora, testing whether topic-level divergence predicts regime transitions) - Time-series statistical models (ARIMA on embedding divergence, testing whether generic time-series prediction matches or exceeds the SRT’s dynamical model)

Target: Tier 3 regime classification ; -trajectory correlation with expert consensus.

Definition: The measurable contribution of the iconic grounding subspace, including phonosemantic features and CLIP-derived embeddings, to the SRT’s semiotic capabilities.

This axis directly evaluates the bouba/kiki hypothesis as embodied in the architecture: that non-arbitrary, cross-modal correspondences between sign form and meaning provide stabilizing attractors that anchor the semiotic landscape against unlimited drift. If this claim is correct, then iconic grounding should be measurably functional, not merely decorative.

Metric: A combination of intrinsic and extrinsic evaluations:

Intrinsic evaluation (does the iconic subspace encode the structure it claims to?):

Extrinsic evaluation (does iconic grounding improve downstream semiotic performance?):

This is evaluated through the ablation study (Section 6.4) by comparing the full SRT against SRT (iconic subspace zeroed or randomized). The key predictions:

Ablating iconic grounding should reduce BC scores, because cross-community bridges often depend on shared embodied referents (both communities agree that “sharp” sounds sharp, even if they disagree about its political implications)
Ablating iconic grounding should increase attractor drift over the course of generation. Without the stabilizing effect of non-arbitrary form-meaning correspondences, the attractor landscape becomes more susceptible to distributional drift
Ablating iconic grounding should reduce CSD detection sensitivity because the early warning mechanism depends partly on monitoring changes in the iconic subspace that presage bifurcation events
The reduction in performance should be larger for signs with high

than for signs with low

, since the iconic grounding matters more where it has more to contribute

Novel evaluation: Cross-linguistic iconic transfer. Train the SRT on English data. Evaluate on contested signs in languages with different phonological systems (Mandarin, Arabic, Swahili). If the iconic grounding captures genuinely universal cross-modal correspondences (the bouba/kiki effect is documented cross-linguistically; Blasi et al., 2016), then the iconic subspace should show positive transfer even for signs in languages not seen during training. Conversely, if the grounding is merely memorized English phonosemantic associations, transfer should be near zero.

Definition: The degree to which semiotic-reflexive training preserves or enhances performance on standard NLP benchmarks.

The SRT must not sacrifice general language competence to achieve semiotic capabilities. This axis establishes that the semiotic extensions are additive rather than substitutive.

Metric: Performance on standard benchmarks, comparing the SRT at (standard generation mode) against the base model prior to semiotic training:

The SRT’s semiotic awareness may improve* performance on truthfulness and bias benchmarks, because these tasks require the model to detect and navigate contested or misleading interpretive framings, which is precisely the capability the semiotic components are designed to support. This is an empirical prediction, not an assumption.

Target: Mean degradation across standard benchmarks; statistically significant improvement on TruthfulQA and BBQ ().

To support reproducible evaluation, we specify the Semiotic Evaluation Corpus (SEC), a purpose-built suite of evaluation datasets. The SEC is designed to be constructed alongside the training corpus (using disjoint data splits) and released publicly to enable community replication.

Size: 500 contested signs 2 community contexts 3 difficulty levels = 3,000 evaluation instances.

Construction: Each instance consists of: - A contested sign in its natural textual context (sentence or paragraph) - Two community context specifications (drawn from the community taxonomy used in training) - Expert-annotated interpretant chains for each community (3-5 links each) - Expert-annotated divergence point (the chain link where interpretations split) - One or more expert-written reference bridge texts (from mediation and deliberative democracy corpora)

Difficulty levels: - Easy: Signs with well-known, widely discussed contestation (e.g., “freedom,” “equality”) where bridging language is readily available in existing discourse - Medium: Signs with domain-specific contestation (e.g., “sustainability” in environmental vs. economic discourse) requiring deeper chain tracing - Hard: Signs where the contestation is subtle, emerging, or involves conflicting semiotic ideologies rather than conflicting referents (e.g., “evidence” in scientific vs. populist epistemological frameworks)

Worked example (medium difficulty):

Size: 200 politically contested texts, each 200-1,000 words, with full expert semiotic analysis.

Construction: Texts are drawn from political speeches, op-eds, social media posts, legislative debates, and media narratives. Each text is accompanied by expert-produced reflexive analysis identifying: all contested signs and their SOI structures, operative indexical orders, active semiotic ideologies, enregisterment patterns, and estimated -value.

Expert analyses are produced by a panel of 3-5 semioticians/linguistic anthropologists per text, with agreement measured for each component. Texts where expert agreement falls below threshold (Section 5.1.2 validation targets) are excluded.

Genre distribution: 30% political speeches, 25% news media/op-eds, 20% social media (Twitter/X, Reddit), 15% legislative/policy text, 10% informal discourse (interviews, focus group transcripts).

Size: 1,000 synthetic scenarios + 500 semi-synthetic scenarios + 50 historical cases.

Synthetic construction: Each scenario is generated by running the stochastic pitchfork model with known parameters (, , , initial conditions) and mapping the trajectory to a semiotic scenario description. Ground truth is exact.

Semi-synthetic construction: Synthetic trajectories are instantiated as naturalistic text sequences (using an LLM, as in Section 5.1.2 Phase 3), then the model must recover the dynamical parameters from the text alone. Ground truth is the generating parameters.

Historical case construction: The 50 cases are selected for temporal completeness (sufficient longitudinal data to track the full trajectory from pre-bifurcation through any bifurcation events), documentation quality, and diversity of topics and political contexts. Each case is annotated by expert panel consensus: -trajectory (estimated at quarterly intervals), regime labels, CSD episodes, and key divergence events. The five cases in Table 4 (Section 6.1.3) constitute the core cases; the full 50 include international (EU identity politics, Indian caste discourse, Brazilian political polarization) and domain-specific cases (scientific consensus vs. skepticism, medical authority vs. patient autonomy).

Size: 300 sign trajectories tracked over 2-10 years each, with monthly or quarterly measurements.

Construction: For each sign, corpus data is collected from community-stratified sources at regular intervals. At each time point, the sign’s interpretant distribution across communities is estimated via embedding divergence metrics. The trajectory is annotated with bifurcation events (points where divergence increases sharply), regime labels, and CSD episodes.

Evaluation task: Given the trajectory up to time , predict: - The trajectory from to (regression on embedding divergence) - Whether a bifurcation event will occur within the prediction horizon (binary classification) - The sign’s eventual regime (convergent or bifurcated)

This is the most ecologically valid evaluation: can the SRT, trained on semiotic dynamics, predict real-world polarization trajectories?

Size: 500 words with phonosemantic ratings + 200 cross-modal correspondence judgments + 100 novel pseudowords.

Construction: - Word set: 500 English words rated by 50+ human participants on bouba/kiki-like scales (rounded-angular, soft-hard, bright-dark), drawn from Sidhu & Pexman (2018) norms and extended with novel ratings - Cross-modal set: 200 word-image pairs where human participants judge correspondence strength, drawn from Thompson & Lupyan (2023) and extended - Pseudoword set: 100 phonotactically legal pseudowords (generated via syllable structure rules for English, Mandarin, and Swahili) rated by human participants on sound symbolism scales. These have no distributional training signal, so only the phonological encoder can provide information

Evaluation task: The model’s iconic subspace embeddings are compared against human judgments. This tests whether the phonosemantic feature space (Section 2.4.5) extracts the cross-modal correspondences that the theory predicts.

Ablation studies isolate the contribution of each architectural component. Table 4 specifies the ablation conditions, the components removed, and the predicted effects on each evaluation axis.

Table 4: Ablation Conditions and Predicted Effects

Key ablation hypotheses:

Architectural necessity: SRT SRT on all axes. This is the central architectural claim: semiotic capabilities require architectural support, not just knowledge in the training data or instructions in the prompt.

Compositional contribution: Each component should contribute measurably to at least one axis, and the full system should outperform any single-component ablation. If a component can be removed without affecting any axis, it is unnecessary and should be eliminated.
Iconic grounding specificity: SRT and SRT should both show reduced IG scores, but the reduction patterns should differ. Phonosemantic features should matter more for sound-symbolic words, while CLIP features should matter more for concrete visual words. If both ablations show identical reduction patterns, the two iconic channels are redundant and one can be eliminated.

Curriculum necessity: SRT should show worse performance than the full SRT, validating the three-phase training schedule. If curriculum ablation has no effect, the simpler non-curriculum training is preferable by Occam’s razor.

All comparisons use the following protocol to ensure rigor:

Significance testing: Paired bootstrap resampling (Efron & Tibshirani, 1993) with 10,000 resamples for all automated metrics. For human evaluations, mixed-effects models (Bates et al., 2015) with evaluator as random effect and system as fixed effect. All comparisons use two-tailed tests with and Bonferroni correction for multiple comparisons across ablation conditions.

Effect size reporting: Cohen’s for all comparisons, in addition to raw score differences. We consider (medium effect) as the threshold for practically meaningful improvement.

Confidence intervals: 95% bootstrap confidence intervals for all point estimates.

Multiple runs: All experiments are run with 3 random seeds. We report mean standard deviation across seeds and perform significance tests on the pooled results.

Power analysis: For human evaluations, we compute the minimum sample size needed to detect a medium effect () with power at . For the BC evaluation (6 dimensions, 2 communities), this requires approximately 50 evaluators per condition (25 per community).

The framework is explicitly falsifiable. The following outcomes would require fundamental revision of specific claims, or of the entire approach:

Component-level falsification (requiring revision of specific architectural choices):

Bridging failure: If BC scores for the full SRT are not statistically significantly higher than B2 (semiotic prompting) with and Cohen’s , then the semiotic extensions do not contribute meaningfully beyond what prompt engineering achieves. Revision: Abandon architectural approach; invest in prompting strategies instead.

Reflexivity vacuity: If RF scores for the SRT are lower than B6 (standard LLM fine-tuned on semiotic texts without SRT architecture), then semiotic knowledge in training data suffices and architectural extensions are unnecessary. Revision: Focus on data curation rather than architecture.
Bifurcation irrelevance: If BP accuracy on Tier 3 historical cases is not significantly above sentiment-divergence baselines, the pitchfork model does not add predictive power beyond simpler divergence metrics. Revision: Replace the bifurcation formalism with a simpler divergence tracking mechanism; remove or simplify the BEN.
Iconic grounding null: If ablating the iconic grounding subspace (SRT) does not reduce BC or increase attractor drift, cross-modal grounding is decorative. Revision: Remove the iconic subspace and the phonological encoder, simplifying the architecture significantly.

Iconic universality failure: If the cross-linguistic transfer evaluation (Section 6.1.4) shows zero positive transfer, the iconic grounding captures language-specific patterns rather than universal cross-modal correspondences. Revision: Re-theorize the iconic grounding as language-specific rather than universal; retrain per-language phonological encoders.

Framework-level falsification (requiring fundamental reconceptualization):

Total architectural null: If no evaluation axis shows significant improvement over B2 (semiotic prompting), the entire architectural approach is unjustified. The compute premium buys nothing that prompting cannot achieve. Revision: Abandon the SRT in favor of prompt-based semiotic analysis.
Standard performance catastrophe: If LPR degradation exceeds 5% on any standard benchmark, the semiotic training has corrupted general language capabilities. Revision: Redesign the training pipeline to better isolate semiotic and distributional learning (e.g., separate phases rather than joint training, or apply semiotic training only via adapters).
Scaling failure: If the SRT’s advantages on BC, RF, and BP do not increase (or actively decrease) when scaling from 7B to larger model sizes, the semiotic capabilities do not benefit from scale in the way distributional capabilities do. Revision: Investigate whether semiotic capabilities require fundamentally different scaling strategies than distributional ones.

What falsification does NOT mean: A failure on one component-level criterion does not invalidate the entire framework. The SRT is modular by design (Section 4.6), and component-level failures indicate which modules need revision, not that the entire semiotic-reflexive approach is wrong. Framework-level falsification, by contrast, would challenge the foundational claim that architectural intervention is needed for semiotic-reflexive AI.

While the SRT has not yet been implemented and trained, the theoretical framework makes specific predictions about the pattern of results that would constitute success:

BC should improve most on hard-difficulty signs, meaning signs where contestation is subtle and involves semiotic ideological conflict rather than simple referential disagreement. This is because the SRT’s advantage lies in its ability to model the process of interpretation (chains, indexical orders, ideologies), not merely in its knowledge of which signs are contested.
RF should show the largest advantage in indexical order identification because distinguishing first-order (referential), second-order (social-indexical), and third-order (metapragmatic) dynamics is architecturally supported by the multi-scale attention heads and the reflexive regulation module, capabilities that standard LLMs lack even with prompting.
BP prediction accuracy should be highest on Tier 1 and lowest on Tier 3, with the gradient tracking the degree of ecological validity (and thus the degree of noise, confounding, and model mismatch with real-world complexity). If Tier 3 accuracy equals Tier 1 accuracy, we should be suspicious that the model is pattern-matching on surface cues rather than estimating genuine dynamical parameters.
Iconic grounding contribution should correlate with sign iconicity. Specifically, the IG ablation should show the largest BC reduction for signs with high (e.g., “sharp,” “crash,” “soothe”) and near-zero reduction for signs with low (e.g., “policy,” “hegemony,” “jurisdiction”). This gradient is the signature of a functionally integrated iconic subspace.

CSD detection should precede empirically documented bifurcation events in the SEC-Drift evaluation. If the SRT’s CSD detection consistently appears after bifurcation events rather than before them, the early warning mechanism is not genuinely predictive.

This paper has proposed a comprehensive framework, encompassing theoretical, architectural, and methodological dimensions, for training language models that are aware of, and responsive to, the semiotic dynamics they participate in. We now consider the framework’s broader implications, its limitations, its ethical dimensions, and the research agenda it opens.

The semiotic-reflexive framework has implications that extend well beyond language model engineering, touching on foundational questions in AI alignment theory, cognitive science, political communication, and the philosophy of language.

Current alignment paradigms, including RLHF, Constitutional AI, and DPO, frame the alignment problem as one of behavioral constraint: the model is a tool to be guided toward outputs that humans evaluate positively. The semiotic-reflexive paradigm reframes alignment fundamentally. Rather than constraining what the model says, it seeks to equip the model with an understanding of how what it says functions within the semiotic ecology it inhabits.

This reframing has several consequences:

From output policing to ecological awareness. Alignment shifts from filtering outputs at the surface level (reject harmful content, prefer helpful responses) to monitoring and modulating the deeper dynamics of meaning. A model that understands interpretant chains does not merely avoid producing text that one community finds offensive; it recognizes why the same text produces divergent effects in different communities and can adjust accordingly. The target is not “produce outputs that humans rate as good” but “produce outputs that improve the semiotic ecology’s capacity for shared meaning.”
From preference aggregation to semiotic navigation. RLHF aggregates human preferences into a reward signal, implicitly assuming that preferences are stable, commensurable, and meaningfully averageable across raters. The semiotic perspective reveals this assumption as a form of erasure (in Irvine and Gal’s sense): aggregation conceals the divergent interpretive frameworks that generate the preferences. The SRT’s community-parameterized architecture makes this divergence explicit, navigable, and productive rather than suppressing it through aggregation.
From safety as constraint to safety as lucidity. The dominant AI safety discourse frames safety negatively: preventing harmful outputs, avoiding deceptive behavior, constraining dangerous capabilities. The semiotic-reflexive framework suggests a positive conception: safety as the capacity for lucid engagement with the treacherous nature of signs. A model that can articulate why a contested sign produces divergent reactions across communities, that can identify where in the interpretant chain the divergence occurs, and that can estimate how close the discourse is to a bifurcation threshold contributes to safety not by being constrained but by being illuminating.

This conception resonates with recent calls for “interpretive AI” (McLeod, 2025; Kockelman, 2025) and connects to broader debates about whether AI systems should be designed as tools (to be controlled) or as participants (to be cultivated). The SRT is designed as a participant, specifically a semiotic participant whose contribution to shared meaning-making is architecturally structured rather than emergent and opaque.

Silverstein’s (1993, 2003) concept of metapragmatic awareness, the reflexive capacity to understand how signs function, how interpretation is structured, and how discourse shapes perception, has remained largely analytical: a tool for researchers studying language. The SRT architecture proposes the first formal operational instantiation of this concept in a computational system.

If successful, this would demonstrate something of theoretical significance: that metapragmatic awareness, long considered a distinctively human reflexive capacity and perhaps the defining feature of human language use that separates it from animal communication, can be partially instantiated in artificial systems, with measurable consequences for the quality of semiotic interaction. The qualifier “partially” is essential. Human metapragmatic awareness is embedded in embodied experience, cultural history, interpersonal relationship, and emotional resonance that no current artificial system approaches. What the SRT instantiates is a computational analogue of metapragmatic awareness: the ability to monitor, represent, and adapt to interpretive dynamics. Whether this analogue constitutes genuine awareness or merely a sophisticated simulation of awareness is a question this paper does not attempt to settle.

What the paper does claim is empirically testable: that a system with this computational analogue of metapragmatic awareness performs measurably better than systems without it on tasks that require cross-community intelligibility, semiotic analysis, and dynamical prediction (the evaluation axes of Section 6). This is a functional claim, not a phenomenological one.

The framework bridges two intellectual traditions that have developed in near-total isolation: semiotics (the study of signs, rooted in Peirce, Saussure, and their successors) and dynamical systems theory (the mathematics of qualitative change, rooted in Poincaré, Lyapunov, and their successors). By mapping:

this mapping is not merely analogical. Section 2.2 develops it formally, showing that specific semiotic phenomena (divergence compounding, threshold effects, hysteresis, asymmetric outcomes) map onto specific mathematical structures (pitchfork normal form, subcritical extension, imperfection sensitivity). The framework opens semiotics to formal quantitative analysis, including prediction, parameter estimation, and early warning detection, while simultaneously enriching dynamical systems theory with the conceptual vocabulary needed to model meaning rather than merely physical or biological systems.

This connection also suggests a novel methodological bridge: tools from nonlinear dynamics (bifurcation analysis, Lyapunov exponents, critical slowing down detection) can be applied to corpora of language data to identify semiotic dynamics without requiring the full SRT architecture. The framework’s theoretical contributions are separable from its architectural proposals, and may prove valuable to researchers working within traditional semiotics, sociolinguistics, or political communication even if the SRT architecture itself proves impractical.

The framework’s integration of the bouba/kiki effect (Section 2.4) and Mangalam’s (2025) anti-Bayesian critique (Section 2.3) has implications for ongoing debates in cognitive science about the nature of categorization and meaning:

Against the Bayesian consensus. If the SRT demonstrates that non-Bayesian architectures (attractor dynamics, iconic grounding) outperform Bayesian-inspired approaches (distributional statistics alone) on tasks involving contested meaning, this provides indirect evidence for Mangalam’s claim that cognition is better understood through dynamical systems than through probabilistic inference. The SRT offers a computational testbed for this theoretical debate. The prenatal bouba/kiki data are consistent with this critique: what calibrates the cross-modal mapping is not probabilistic inference but developmental exposure to the acoustic statistics of the embryonic environment (Lancaster, 2026b).
Iconic grounding as a cognitive – and biological – universal. The bouba/kiki effect has been documented cross-linguistically, but the cross-species evidence (Versace et al., 2023) transforms the debate about its functional significance. If the same mapping appears in organisms separated by 310 million years of evolution – in chicks with no language and no vocal tract – then iconic grounding is not a cultural curiosity but a conserved feature of vertebrate neurodevelopment. It constitutes what Peirce classified as a hypoicon (CP 2.276): a sign operating through shared quality at the level of Firstness, prior to interpretation in the full triadic sense. If the SRT’s iconic grounding subspace measurably improves cross-community bridging (evaluation Axis 4, Section 6.1.4), and particularly if this improvement transfers cross-linguistically (the novel evaluation proposed in Section 6.1.4), this constitutes evidence that iconic grounding plays a functional role in meaning stabilization grounded in biology, not merely convention.
Interpretant chains as cognitive trajectories. The Kockelman/Lancaster formalization of interpretant chains as dynamical trajectories suggests that understanding, defined as the process by which a sign produces meaning, is a temporal unfolding through a landscape of attractors, not a static mapping from input to output. If the SRT’s ability to trace and predict these trajectories proves empirically robust, this supports a radically processual view of cognition.

The framework’s ambitions entail significant limitations. We organize these from most to least severe.

The most fundamental limitation is that the SRT has not been implemented, trained, or evaluated. This paper is a design specification, not an empirical report. Every claim about performance is a prediction, every metric a target, every comparison a hypothesis. The evaluation framework of Section 6 is designed to test these predictions rigorously, but until Sections 6.1-6.5 are populated with actual data, the framework remains in the domain of plausible theoretical architecture rather than demonstrated capability.

We consider this limitation to be appropriate for the current stage of the research program. The framework is sufficiently complex, integrating theories from semiotics, dynamical systems, cognitive science, and AI, that premature implementation (building before the theoretical architecture is clear) would likely produce an ad hoc system whose behavior is difficult to interpret. The specification-first approach taken here ensures that when implementation proceeds, it will be guided by explicit theoretical commitments, testable predictions, and falsification criteria (Section 6.5).

The Semiotic Annotation Schema (Section 5.1) requires expert annotation of interpretant chains, community contexts, indexical orders, and metapragmatic metadata. This annotation is:

More labor-intensive than standard NLP annotation (sentiment, NER, POS tagging) by approximately an order of magnitude per token, because it requires not just labeling but interpretation, that is, understanding how multiple communities would engage with the same text
Culturally situated: Annotators must be familiar with the interpretive frameworks of specific communities, which limits the pool of qualified annotators and introduces the risk that annotations reflect the annotators’ own semiotic ideologies
Potentially circular: The framework requires annotations to train a model that performs the very tasks the annotations exemplify. If annotation quality is poor, the model learns poor semiotic analysis; if annotation quality is high, one might ask whether the model adds value beyond the annotation pipeline itself

Mitigations: The hybrid human-AI pipeline (Section 5.1.2) reduces cost; synthetic augmentation (Phase 3) provides controlled training signal with exact ground truth; and the validation procedure (Phase 4) establishes quality thresholds. But the fundamental tension between annotation depth and annotation scale remains unresolved and is the primary engineering challenge for practical implementation.

The framework parameterizes interpretant embeddings by community context vectors , treating communities as identifiable, labelable entities with coherent interpretive frameworks. In practice, interpretive communities are:

Fluid: Individuals move between communities, hold mixed allegiances, and interpret inconsistently even within a single conversation
Internally diverse: “Conservative” and “progressive” are not monolithic communities but coalitions of sub-communities with partially overlapping and partially contradictory interpretive frameworks
Relationally defined: Communities exist in relation to each other, and the boundaries between them are themselves sites of contestation (a point that Irvine and Gal’s concept of fractal recursivity captures)
Performatively constituted: The act of identifying and labeling a community can itself affect the community’s self-understanding and interpretive behavior, a concern directly relevant to a system designed to generate text about communities

The community context vector’s continuous nature () partially addresses the fluidity and diversity problems: communities need not be discrete labels but can be smooth blends in the vector space. Hierarchical community representations (where “conservative” is decomposed into “libertarian-conservative,” “social-conservative,” “nationalist-conservative” sub-vectors) would further address internal diversity. But the fundamental risk of reification, the possibility that the model’s community labels produce or reinforce the very divisions they claim to analyze, remains a serious concern.

The bouba/kiki effect is robust, well-documented, cross-linguistically attested, and – with the chick data from Versace et al. (2023) – demonstrated across species separated by 310 million years of evolution. Lancaster (2026b) argues that it originates in prenatal acoustic calibration, providing a phylogenetically deep biological foundation. But it is also, in the grand scheme of human embodied experience, a relatively simple cross-modal correspondence. The framework’s use of iconic grounding as an attractor stabilization mechanism (Section 2.4.7) treats it as an existence proof and a design pattern rather than a comprehensive theory of grounding.

The prenatal mechanism remains a hypothesis. It is the most parsimonious account consistent with the cross-species and developmental evidence, but the definitive experiment – manipulating the prenatal acoustic environment and testing whether the mapping changes (Lancaster, 2026b, Section 7.1) – has not been conducted. The gap between the iconic substrate (pre-linguistic, cross-modal association) and the symbolic capacity (arbitrary signs, recursive grammar, unlimited semiosis) involves cognitive and social machinery that the bouba/kiki literature does not illuminate. The iconic mapping is a foundation, not the whole building.

The full richness of embodied semantics, including proprioceptive, interoceptive, emotional, spatial, and temporal dimensions, vastly exceeds what the phonosemantic feature space (6 dimensions, Section 2.4.5) and CLIP visual features (128 dimensions) can capture. The iconic grounding subspace is a beachhead, not a territory. Future extensions should incorporate richer sensorimotor features (e.g., haptic, kinesthetic, gustatory) drawn from embodied cognition research, though doing so will require training data and evaluation paradigms that do not currently exist.

The pitchfork bifurcation is a canonical model of symmetry-breaking, chosen because its mathematics map elegantly onto the structure of semiotic polarization (Section 2.2). But real-world polarization may involve dynamics that the pitchfork model does not capture:

Multi-polar bifurcation: The pitchfork produces two basins. Semiotic fragmentation may produce three, four, or many basins (as in multi-party political systems where signs fragment along multiple axes simultaneously)
Network effects: The pitchfork treats the system as a single integrated field. In practice, semiotic dynamics play out across social networks with heterogeneous connectivity, creating spatially structured bifurcation patterns that mean-field models like the pitchfork cannot represent
Exogenous shocks: The pitchfork models gradual parameter change. Real-world polarization events can be triggered by sudden exogenous shocks (a pandemic, an assassination, a technological disruption) that do not correspond to smooth

-parameter trajectories
Feedback between signs: The pitchfork models the dynamics of a single sign. The SRT applies it sign-by-sign, but the interactions between simultaneously contested signs (how the polarization of “woke” affects the polarization of “freedom” affects the polarization of “justice”) may produce emergent dynamics not captured by independent pitchfork models

The BEN’s architecture (Section 4.5) is designed to estimate effective -values from text, not to enforce the pitchfork model directly. This provides some robustness: if real-world dynamics deviate from the pitchfork, the BEN may learn to estimate effective parameters that capture the empirical dynamics even if the underlying model is imprecise. But the training pipeline’s reliance on pitchfork simulations for synthetic data (Section 5.1.2, Phase 3) may bias the BEN toward pitchfork-shaped dynamics. Future work should explore alternative dynamical models (cusp catastrophe, Hopf bifurcation, coupled oscillators) as data-generating processes for synthetic training data.

Bridging coherence and reflexivity fidelity rely fundamentally on human evaluation, which introduces subjectivity, cultural bias, and significant cost. The structured rubrics, cross-community evaluation panels, and statistical methodology of Section 6 mitigate but do not eliminate these concerns. In particular:

Evaluators are asked to judge “fairness” and “illumination,” concepts that are themselves contested and community-dependent
Cross-community panels may converge on a lowest-common-denominator evaluation that penalizes genuinely novel bridging strategies
Human evaluation does not scale: evaluating every generation requires human labor, making continuous evaluation during training infeasible

Developing automated proxy metrics that correlate with human BC and RF scores would greatly reduce evaluation cost. Such proxies might include: embedding-based divergence metrics (measuring whether the generated text occupies an intermediate space between community centroids), chain coverage metrics (automated detection of whether interpretant chain elements from both communities appear in the text), and -estimation accuracy (fully automatable on synthetic data). These proxies should be validated against human ratings before being used as training signals.

A system designed to understand and navigate semiotic dynamics raises ethical questions that must be addressed explicitly.

The interpretive capabilities that enable bridging can also enable manipulation. A model that understands how interpretant chains function across communities could, in principle, be used to:

Craft messages optimized to activate specific characterological associations and polarize audiences
Generate text that appears to bridge while actually advancing one community’s interpretive framework at the expense of another’s (sophisticated propaganda)
Identify near-critical semiotic contexts and generate text that pushes them past the bifurcation threshold

The dual-use problem is not unique to the SRT, since any understanding of persuasive dynamics can be used persuasively or counter-persuasively, but the SRT’s explicit modeling of semiotic dynamics makes the dual-use potential particularly salient.

Mitigations: (1) The BEN’s -estimation is designed for early warning, not for optimization toward higher -values; architecturally, the system is designed to alert and modulate, not to amplify. (2) The inference-time modes (Section 5.5) provide graduated control, and Mode 1 () disables semiotic modulation entirely. (3) The training pipeline explicitly includes cross-community fairness as a training objective (Task 2, Section 5.3.2), penalizing one-sided framing. (4) However, technical mitigations are insufficient alone; deployment governance, access controls, and institutional oversight are necessary complements.

The SRT’s community-parameterized architecture requires defining, labeling, and modeling “communities,” a process that inevitably simplifies, stereotypes, and potentially reifies social groups. Producing text “as a libertarian would interpret it” risks reinforcing caricatures of libertarian thought; parameterizing interpretation by community risk reducing the rich internal diversity of any community to a single vector.

Mitigations: (1) Community vectors are continuous and blendable, not discrete labels. (2) The annotation procedure (Section 5.1.2) requires cross-community annotator pairing to reduce stereotyping. (3) The reflexive commentary task (Section 5.3.3) includes metapragmatic self-reporting, training the model to note when it may be simplifying. (4) Evaluation explicitly penalizes unfair or unequal representation of communities (the Fairness dimension of BC). (5) Future work should explore participatory approaches where communities are involved in defining their own interpretive parameters.

A well-intentioned system that monitors semiotic dynamics and intervenes to reduce polarization may become paternalistic, deciding on behalf of users when discourse is “too polarized” and modulating output to reduce divergence. This raises questions about autonomy, epistemic freedom, and the legitimacy of automated semiotic intervention.

The framework’s resolution is architectural: the parameter is user-controllable, and Mode 1 provides a complete opt-out. The system is designed to offer semiotic awareness, not to impose it. At , the SRT is a standard language model; the semiotic capabilities are an available resource, not a mandatory filter. This design reflects a commitment to user autonomy that should be maintained through deployment: the default should be transparency about semiotic dynamics, not invisible correction of them.

The immediate priority is prototyping the SRT at small scale (1-3B parameters) and evaluating on the proposed benchmark suite. We recommend a staged implementation:

Stage 1: SEL only. Implement the Semiotic Embedding Layer with phonosemantic features and community context vectors on a small transformer (1B parameters). Evaluate whether the expanded embedding space measurably affects performance on BC and IG axes. This tests the foundational architectural claim before adding complexity.
Stage 2: SEL + MAH. Add the Metapragmatic Attention Heads and evaluate interpretant chain prediction. This tests whether multi-scale divergence monitoring adds value beyond the embedding-level representation.
Stage 3: SEL + MAH + RRM. Add the Reflexive Regulation Module and evaluate reflexive commentary production. This tests the meta-observation mechanism.
Stage 4: Full SRT. Add the BEN and evaluate the complete system. Run the full ablation study (Section 6.3). This provides comprehensive empirical validation.

Each stage is independently valuable. The SEL alone may improve cross-community performance even without the full architecture. Each stage’s results inform the design of the next.

The SRT as specified is a generative model: it produces text in response to prompts. A natural extension is to embed it within an agentic framework capable of:

Real-time monitoring: Tracking semiotic dynamics across platforms as they unfold, identifying emerging bifurcations before they stabilize, and generating alerts when CSD indicators exceed threshold
Proactive intervention: Generating bridge texts, reflexive prompts, or alternative framings in response to detected polarization dynamics, not waiting to be asked but rather offering semiotic resources when the environment suggests they are needed
Longitudinal tracking: Maintaining persistent sign profiles that track how specific signs’ interpretant distributions evolve over time, enabling trend analysis and long-range prediction
Multi-agent semiosis: Deploying multiple SRT instances parameterized for different communities, enabling systematic exploration of how a text will be interpreted across the semiotic landscape before it is published

This agentic trajectory points toward what the foundational essays envision as “semiotic stewardship at civilizational scale,” referring to AI systems that participate constructively in the ecology of meaning rather than amplifying its fractures. Such systems would require not only the SRT’s capabilities but also careful governance frameworks to prevent the dual-use problems discussed in Section 7.3.1.

The framework is developed primarily with reference to English-language political discourse in the United States, a significant limitation given that semiotic dynamics are culturally specific. Extension requires:

Phonological encoder adaptation: The phonosemantic feature space (Section 2.4.5) is defined for languages with the relevant articulatory contrasts (vowel roundedness, consonant manner, etc.). Extension to tonal languages (Mandarin, Yoruba), click languages (Xhosa, Zulu), or sign languages requires redesigning the phonological feature space.
Community taxonomy reconfiguration: The political community landscape differs radically across contexts. American progressive/conservative is not translatable to French left/right, Indian caste-communal dynamics, or Brazilian evangelical/secular tensions. Community parameterization must be context-specific.
Theoretical validation: The Peircean semiotic framework is universalist in aspiration, but its specific instantiations (indexical orders, enregisterment, semiotic ideology) have been developed primarily in Western academic contexts. Cross-cultural validation requires engagement with non-Western semiotic traditions.
Evaluation infrastructure: The SEC benchmark suite (Section 6.2) would need to be constructed independently for each language/cultural context, a significant research effort in itself.

The cross-linguistic iconic transfer evaluation (Section 6.1.4) provides a preliminary test: if the iconic grounding subspace transfers to unseen languages, this suggests that at least the iconic component of the framework has cross-cultural validity. But full cross-cultural extension is a multi-year research program.

The framework’s identification of algorithmic amplification as a bifurcation control variable ( in the pitchfork model) suggests a concrete path toward semiotic-aware platform governance:

-monitoring dashboards: Platforms could deploy SRT-derived estimators to monitor the effective

-value of discourse around specific signs, enabling real-time tracking of polarization dynamics

Algorithmic

-budgets: Rather than moderating individual pieces of content (a symptom-level intervention), platforms could set

-budgets for their recommendation systems, imposing constraints on the total amplification the algorithm may apply to divergent interpretations of contested signs
Bifurcation-aware recommendation: Recommendation algorithms could be modified to reduce amplification when estimated

approaches the critical threshold, implementing a form of semiotic braking that prevents systems from crossing into bifurcated regimes
CSD-triggered review: When critical slowing down indicators exceed threshold for a particular sign (suggesting an approaching bifurcation event), human content moderators could be alerted for review, serving as a triage mechanism that focuses human attention where dynamical analysis predicts it is most needed

These applications require careful governance design. The risk of semiotic paternalism (Section 7.3.3) is acute in platform contexts where interventions affect millions of users. Transparency about when and why semiotic modulation is occurring is essential.

The SEC-Drift benchmark (Section 6.2.4) tests the model’s ability to predict semiotic trajectories over time. The strongest validation of the framework would be a prospective longitudinal study:

Train the SRT on data through time

Identify signs with estimated

-values near the critical threshold (near-bifurcation signs)
Predict which signs will bifurcate and which will retreat to consensus over the interval

Compare predictions against observed outcomes at

This is a demanding evaluation that requires waiting for time to pass and outcomes to materialize. However, it is also the most scientifically compelling, because it tests the framework’s predictive capacity in the domain where it is most theoretically ambitious: forecasting qualitative transitions in the dynamics of meaning.

Language models are semiotic infrastructure. They shape the interpretant chains available to billions of users, modulate the attractor landscapes through which meaning stabilizes, and contribute to the dynamical processes that determine whether shared understanding remains possible across interpretive communities.

To train these models as if language were a sequence prediction problem, to optimize for the statistical surface while ignoring the dynamical depth, is to build infrastructure that is, in the precise sense developed in this paper, semiotically blind. Such models do not see signs; they see token statistics. They do not trace interpretant chains; they predict next-token probabilities. They do not estimate bifurcation parameters; they respond uniformly regardless of the dynamical state of the semiotic environment. When deployed at scale, this blindness has consequences: the models participate in semiotic dynamics without understanding them, amplifying the pitchfork rather than raising its threshold, because they cannot distinguish between text that bridges attractor basins and text that deepens them.

This paper has proposed an alternative: the Semiotic-Reflexive Transformer, an architecture that makes the structure of meaning an explicit object of model training. The framework’s contributions are organized at four levels:

Theoretical: A formal bridge between Peircean semiotics and dynamical systems theory (Section 2), connecting interpretant chains to dynamical trajectories, contested signs to bifurcation, iconic grounding to attractor stability, and metapragmatic awareness to reflexive regulation. This bridge is developed through the pitchfork bifurcation model with extensions (asymmetric, subcritical, stochastic) that capture specific empirical phenomena of semiotic polarization. The framework integrates the bouba/kiki effect as a principled mechanism for attractor stabilization, formalizing the Generalized Icon Hypothesis (Section 2.4.8): that non-arbitrary form-meaning correspondences provide basin depth that resists bifurcation-driven drift, now empirically anchored in prenatal cross-species evidence (Section 2.4.3-2.4.4) demonstrating that these correspondences are conserved across 310 million years of vertebrate evolution.
Architectural: Four modular components, specifically the Semiotic Embedding Layer, Metapragmatic Attention Heads, Reflexive Regulation Module, and Bifurcation Estimation Network, embed semiotic awareness into the transformer architecture without sacrificing compatibility with standard pre-training infrastructure (Section 4). The architecture is designed for graceful degradation: each component adds capability independently, and the full system can be reduced to a standard transformer by setting .

Methodological: A complete training pipeline, encompassing the Semiotic Annotation Schema, a four-phase hybrid annotation procedure, a four-objective pre-training loss with curriculum schedule, four fine-tuning tasks, and three inference modes (Section 5). The pipeline is specified in sufficient detail for replication, with concrete data formats, hyperparameter recommendations, and compute estimates.
Evaluative: A rigorous evaluation framework, consisting of five evaluation axes, a benchmark suite (the Semiotic Evaluation Corpus), systematic ablation studies, statistical methodology, and explicit falsification criteria that specify what empirical findings would require fundamental revision of the framework’s claims (Section 6).

The framework does not promise to eliminate polarization, resolve the arbitrariness of signs, or produce artificial understanding. It proposes something more modest and more tractable: that language models can be trained to recognize the gap between sign and referent as a structural feature of semiosis rather than a defect to be corrected; to trace how that gap compounds through interpretant chains across communities; to estimate when compounding will cross bifurcation thresholds into self-reinforcing divergence; and to generate text that surfaces these dynamics rather than concealing them.

If the treachery of signs is the condition of human meaning-making, if signs always and necessarily defer, distort, and diverge, then the task is not to end the treachery but to face it with lucidity. The computational question this paper poses is whether language models can be built to extend that lucidity rather than undermine it. The theoretical analysis suggests yes. The architectural specification shows how. The evaluation framework specifies what “yes” and “no” would look like empirically.

The pitchfork has done its work. The question is whether the next generation of language technologies will deepen the tines or raise the threshold.

Abrams, R. M., Gerhardt, K. J., & Peters, A. J. M. (1998). Transmission of sound and vibration to the fetus. In J. P. Lecanuet et al. (Eds.), Fetal Development: A Psychobiological Perspective. Psychology Press.

Adelman, J. S., Estes, Z., & Cossu, M. (2018). Emotional sound symbolism: Languages rapidly signal valence via phonemes. Cognition, 175, 122-130.

Agha, A. (2003). The social life of cultural value. Language & Communication, 23(3-4), 231-273.

Alayrac, J.-B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., … & Simonyan, K. (2022). Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35.

Andersen, P. B., & Hasle, P. (2002). Semiotics and computer science: An overview. In P. B. Andersen, K. Holmqvist, & J. F. Jensen (Eds.), The computer as medium (pp. 273-306). Cambridge University Press.

Bail, C. A., Argyle, L. P., Brown, T. W., Bumpus, J. P., Chen, H., Hunzaker, M. B. F., … & Volfovsky, A. (2018). Exposure to opposing views on social media can increase political polarization. Proceedings of the National Academy of Sciences, 115(37), 9216-9221.

Bai, Y., Jones, A., Ndousse, K., Askell, A., Chen, A., DasSarma, N., … & Kaplan, J. (2022). Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862.

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.

Bell, D. A. (1980). Brown v. Board of Education and the interest-convergence dilemma. Harvard Law Review, 93(3), 518-533.

Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185-5198.

Blasi, D. E., Wichmann, S., Hammarström, H., Stadler, P. F., & Christiansen, M. H. (2016). Sound-meaning association biases evidenced across thousands of languages. Proceedings of the National Academy of Sciences, 113(39), 10818-10823.

Bolukbasi, T., Chang, K.-W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29.

Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 632-642.

Boxell, L., Gentzkow, M., & Shapiro, J. M. (2017). Greater internet use is not associated with faster growth in political polarization among US demographic groups. Proceedings of the National Academy of Sciences, 114(40), 10612-10617.

Bremner, A. J., Caparos, S., Davidoff, J., de Fockert, J., Linnell, K. J., & Spence, C. (2013). “Bouba” and “Kiki” in Namibia? A remote culture make similar shape-sound matches, but different shape-taste matches to Westerners. Cognition, 126(2), 165-172.

Casper, S., Davies, X., Shi, C., Gilbert, T. K., Scheurer, J., Rando, J., … & Hadfield-Menell, D. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. arXiv preprint arXiv:2307.15217.

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., … & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

Christiano, P. F., Leike, J., Brown, T., Marber, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30.

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., & Tafjord, O. (2018). Think you have solved question answering? Try ARC, the AI2 reasoning challenge. arXiv preprint arXiv:1803.05457.

Conover, M. D., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., & Flammini, A. (2011). Political polarization on Twitter. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, 5(1), 89-96.

Crenshaw, K. (1989). Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, 1989(1), 139-167.

DeCasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers’ voices. Science, 208(4448), 1174-1176.

DeCasper, A. J., & Spence, M. J. (1986). Prenatal maternal speech influences newborns’ perception of speech sounds. Infant Behavior and Development, 9(2), 133-150.

DeGroot, M. H. (1974). Reaching a consensus. Journal of the American Statistical Association, 69(345), 118-121.

Dingemanse, M. (2012). Advances in the cross-linguistic study of ideophones. Language and Linguistics Compass, 6(10), 654-672.

Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.

Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., … & Olah, C. (2021). A mathematical framework for transformer circuits. Transformer Circuits Thread.

Fernald, A. (1989). Intonation and communicative intent in mothers’ speech to infants: Is the melody the message? Child Development, 60(6), 1497-1510.

Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.

Garimella, K., De Francisci Morales, G., Gionis, A., & Mathioudakis, M. (2018). Quantifying controversy on social media. ACM Transactions on Social Computing, 1(1), 1-27.

Goldberg, Y. (2019). Assessing BERT’s syntactic abilities. arXiv preprint arXiv:1901.05287.

Gottlieb, G. (1971). Development of Species Identification in Birds. University of Chicago Press.

Guess, A. M., Lyons, B. A., Montgomery, J. M., Nyhan, B., & Reifler, J. (2023). Reshares on social media amplify political news but do not detectably affect beliefs or opinions. Science, 381(6656), 404-408.

Guo, W., & Caliskan, A. (2021). Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 122-133.

Hegselmann, R., & Krause, U. (2002). Opinion dynamics and bounded confidence: Models, analysis and simulation. Journal of Artificial Societies and Social Simulation, 5(3).

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., & Steinhardt, J. (2021). Measuring massive multitask language understanding. International Conference on Learning Representations.

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., … & Sifre, L. (2022). Training compute-optimal large language models. Advances in Neural Information Processing Systems, 35.

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., … & Chen, W. (2022). LoRA: Low-rank adaptation of large language models. International Conference on Learning Representations.

Imai, M., Kita, S., Nagumo, M., & Okada, H. (2008). Sound symbolism facilitates early word learning. Cognition, 109(1), 54-65.

Irvine, J. T., & Gal, S. (2000). Language ideology and linguistic differentiation. In P. V. Kroskrity (Ed.), Regimes of language: Ideologies, polities, and identities (pp. 35-84). School of American Research Press.

Keane, W. (2003). Semiotics and the social analysis of material things. Language & Communication, 23(3-4), 409-425.

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., … & Hadsell, R. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521-3526.

Knudsen, E. I. (2002). Instructed learning in the auditory localization pathway of the barn owl. Nature, 417(6886), 322-328.

Kockelman, P. (2024). Last Words: Meaning, Minds, and Artificial Intelligence. University of Chicago Press.

Kockelman, P. (2025). Mathematical models of meaning: A dynamic systems approach to possible world semiotics. Working paper.

Köhler, W. (1929). Gestalt Psychology. Liveright.

Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press.

Lancaster, J. B. (2025). The treachery of signs: Semiotic mediation, pitchfork bifurcation, and political polarization in algorithmically curated societies. SSRN Electronic Journal.

Lancaster, J. B. (2026b). The bouba-kiki effect as prenatal semiotic grounding: Cross-species evidence for iconic mapping prior to language.

Leonard, N. E., Lipsitz, K., Bizyaeva, A., Franci, A., & Lelkes, Y. (2021). The nonlinear feedback dynamics of asymmetric political polarization. Proceedings of the National Academy of Sciences, 118(50), e2102149118.

Li, K., Patel, O., Viégas, F., Pfister, H., & Wattenberg, M. (2024). Inference-time intervention: Eliciting truthful answers from a language model. Advances in Neural Information Processing Systems, 36.

Liang, P. P., Wu, C., Morency, L.-P., & Salakhutdinov, R. (2021). Towards understanding and mitigating social biases in language models. International Conference on Machine Learning, 6565-6576.

Lin, S., Hilton, J., & Evans, O. (2022). TruthfulQA: Measuring how models mimic human falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 3214-3252.

Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual instruction tuning. Advances in Neural Information Processing Systems, 36.

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. International Conference on Learning Representations.

Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2020). Pretrained transformers as universal computation engines. arXiv preprint arXiv:2103.05247.

Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2024). Dissociating language and thought in large language models. Trends in Cognitive Sciences, 28(6), 517-540.

Mangalam, M. (2025). The myth of the Bayesian brain. European Journal of Applied Physiology.

Manzini, T., Yao Chong, L., Black, A. W., & Tsvetkov, Y. (2019). Black is to criminal as Caucasian is to police: Detecting and removing multiclass bias in word embeddings. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 615-621.

McLeod, S. (2025). Language: The case against AI. Substack.

Nielsen, A. K. S., & Rendall, D. (2011). The sound of round: Evaluating the sound-symbolic role of consonants in the classic Takete-Maluma phenomenon. Canadian Journal of Experimental Psychology, 65(2), 115-124.

Ohala, J. J. (1994). The frequency code underlies the sound-symbolic use of voice pitch. In L. Hinton, J. Nichols, & J. J. Ohala (Eds.), Sound symbolism (pp. 325-347). Cambridge University Press.

OpenAI. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774.

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35.

Ozturk, O., Krehm, M., & Vouloumanos, A. (2013). Sound symbolism in infancy: Evidence for sound-shape cross-modal correspondences in 4-month-olds. Journal of Experimental Child Psychology, 114(2), 173-186.

Pariser, E. (2011). The Filter Bubble: What the Internet Is Hiding from You. Penguin Press.

Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., … & Bowman, S. R. (2022). BBQ: A hand-built bias benchmark for question answering. Findings of the Association for Computational Linguistics: ACL 2022, 2086-2105.

Peiffer-Smadja, N., & Cohen, H. (2019). Exploring the sound-meaning correspondence in language: A review of sound symbolism research. Language and Cognition, 11(2), 265-293.

Peirce, C. S. (1931-1958). Collected Papers of Charles Sanders Peirce (Vols. 1-8). Harvard University Press.

Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & Rand, D. G. (2021). Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855), 590-595.

Piantadosi, S. T. (2023). Modern language models refute Chomsky’s approach to language. Lingbuzz preprint.

Pouget, A., Beck, J. M., Ma, W. J., & Latham, P. E. (2013). Probabilistic brains: Knowns and unknowns. Nature Neuroscience, 16(9), 1170-1178.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., … & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 8748-8763.

Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D., & Finn, C. (2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.

Ramachandran, V. S., & Hubbard, E. M. (2001). Synaesthesia – a window into perception, thought and language. Journal of Consciousness Studies, 8(12), 3-34.

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … & Sutskever, I. (2021). Zero-shot text-to-image generation. International Conference on Machine Learning, 8821-8831.

Reddy, M. J. (1979). The conduit metaphor: A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and thought (pp. 284-310). Cambridge University Press.

Rogers, L. J. (1995). The Development of Brain and Behaviour in the Chicken. CAB International.

Sanes, D. H., & Bao, S. (2009). Tuning up the developing auditory CNS. Current Opinion in Neurobiology, 19(2), 188-199.

Sapir, E. (1929). A study in phonetic symbolism. Journal of Experimental Psychology, 12(3), 225-239.

Saussure, F. de. (1916/1959). Course in General Linguistics (W. Baskin, Trans.). Philosophical Library.

Scheffer, M., Bascompte, J., Brock, W. A., Brovkin, V., Carpenter, S. R., Dakos, V., … & Sugihara, G. (2009). Early-warning signals for critical transitions. Nature, 461(7260), 53-59.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Schweighofer, S., Schweitzer, F., & Garcia, D. (2020). A weighted balance model of opinion hyperpolarization. Journal of Artificial Societies and Social Simulation, 23(3).

Sidhu, D. M., & Pexman, P. M. (2018). Five mechanisms of sound symbolic association. Psychonomic Bulletin & Review, 25(5), 1619-1643.

Silverstein, M. (1993). Metapragmatic discourse and metapragmatic function. In J. A. Lucy (Ed.), Reflexive language: Reported speech and metapragmatics (pp. 33-58). Cambridge University Press.

Silverstein, M. (2003). Indexical order and the dialectics of sociolinguistic life. Language & Communication, 23(3-4), 193-229.

Stein, B. E., & Meredith, M. A. (1993). The Merging of the Senses. MIT Press.

Sunstein, C. R. (2001). Republic.com. Princeton University Press.

Tanaka-Ishii, K. (2010). Semiotics of Programming. Cambridge University Press.

Taub, S. F. (2001). Language from the Body: Iconicity and Metaphor in American Sign Language. Cambridge University Press.

Thompson, B., & Lupyan, G. (2023). Abstract concepts and the grounding problem: Evidence from CLIP. Proceedings of the 45th Annual Conference of the Cognitive Science Society.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

Versace, E., Martinho-Truswell, A., Kacelnik, A., & Vallortigara, G. (2023). Priors for cross-modal correspondences in newly hatched chicks. Philosophical Transactions of the Royal Society B, 378(1875).

Voelkel, J. G., Chu, J., Stagnaro, M. N., Mernyk, J. S., Redekopp, C., Pink, S. L., … & Willer, R. (2022). Interventions reducing affective polarization do not necessarily improve anti-democratic attitudes. Nature Human Behaviour, 7, 55-64.

Williams, A., Nangia, N., & Bowman, S. R. (2018). A broad-coverage challenge corpus for sentence understanding through inference. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 1112-1122.

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., & Choi, Y. (2019). HellaSwag: Can a machine really finish your sentence? Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4791-4800.

Zhai, Y., Tong, S., Li, X., Cai, M., Qu, Q., Lee, Y. J., & Ma, Y. (2023). Investigating the catastrophic forgetting in multimodal large language models. arXiv preprint arXiv:2309.10313.

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K.-W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 15-20.

Zou, A., Wang, J., Kolter, J. Z., & Fredrikson, M. (2023). Representation engineering: A top-down approach to AI transparency. arXiv preprint arXiv:2310.01405.

Attractor basin: A region in dynamical state space toward which trajectories converge. In the semiotic context, an attractor basin represents a stable interpretive regime, a self-reinforcing pattern of meaning-making where signs, objects, and interpretants form coherent, mutually reinforcing structures. Post-bifurcation, two or more basins coexist, corresponding to divergent but internally coherent interpretive frameworks.

Bifurcation: A qualitative change in the structure of a dynamical system’s equilibria as a control parameter varies. The supercritical pitchfork bifurcation () is the canonical form used in this paper: below threshold (), one stable equilibrium exists (consensus); above threshold (), two stable equilibria emerge (polarized interpretive regimes). Extensions: asymmetric (), subcritical (hysteresis), stochastic (). See Section 2.2.

Bifurcation Estimation Network (BEN): The fourth SRT component. A 3-layer MLP that estimates the local bifurcation parameter from the model’s internal representations and generates modulation vectors that adapt generation to the estimated dynamical regime. See Section 4.5.

Bouba/kiki effect: The robust cross-modal correspondence between speech sounds and visual shapes: rounded vowels and sonorant consonants (e.g., “bouba”) are consistently associated with rounded shapes, while unrounded vowels and obstruent consonants (e.g., “kiki”) are associated with angular shapes. Documented cross-linguistically and from infancy (4 months). Used in this paper as the empirical foundation for the iconic grounding mechanism and the Generalized Icon Hypothesis. See Section 2.4.

Bridging coherence (BC): The capacity of a text to render a contested sign intelligible across opposing interpretive communities without collapsing the sign’s contested character into false consensus or false equivalence. The primary evaluation metric for the SRT’s generative capability. See Section 6.1.1.

Chain divergence: The phenomenon whereby small differences in interpretant assignment at one link in an interpretant chain compound through subsequent links, producing large differences in the chain’s endpoint. Modeled formally through the pitchfork bifurcation: divergence is bounded below threshold and self-amplifying above it. See Section 2.1.2.

Community context vector (): A learned vector () that parameterizes the interpretant embedding, enabling the model to represent how the same sign produces different interpretive effects in different communities. See Section 4.2.4.

Critical slowing down (CSD): A dynamical phenomenon where the system’s return time to equilibrium increases as the control parameter approaches a bifurcation threshold. Manifested in the SRT as increasing variance in estimates over a sliding window. Used as an early warning indicator for impending bifurcation. See Section 4.5.3.

Dynamic object: In Peircean semiotics, the actual object in all its complexity, which the sign can only partially represent. Distinguished from the immediate object (the object as the sign presents it). The gap between immediate and dynamic object is structural to semiosis and drives interpretant chain formation. See Section 2.1.1.

Enregisterment: The social process by which linguistic forms become associated with typified social personas (characterological figures), values, and stances through repeated use in recognizable social contexts (Agha, 2003). The process by which a word like “woke” acquires divergent social-indexical values across communities. See Section 2.1.3.

Generalized Icon Hypothesis: The theoretical claim (Section 2.4.7-2.4.8) that non-arbitrary form-meaning correspondences (of which bouba/kiki is the best-documented instance) provide basin depth that resists the drift induced by purely conventional (arbitrary) sign-object associations. Formally: , where the iconic term is community-independent.

Iconic grounding: The anchoring of sign-object relations in non-arbitrary cross-modal correspondences (phonosemantic, visual-semantic), as opposed to purely conventional (arbitrary) associations. Provides attractor stability that is independent of community and convention. See Section 2.4, 4.2.3.

Immediate object: In Peircean semiotics, the object as the sign presents it, meaning the partial, perspectival representation of the referent that is internal to the sign relation. Distinguished from the dynamic object. See Section 2.1.1.

Indexical order: Silverstein’s (2003) framework distinguishing three orders of semiotic function: first-order (direct perception/reference), second-order (social-indexical meaning: what the sign indexes about the speaker’s identity, stance, or community membership), and third-order (metapragmatic awareness: reflexive commentary on the sign’s functioning). See Section 2.1.3.

Indexicality: The property of signs that point to, presuppose, or create social contexts through existential, causal, or pragmatic connection. Distinguished from iconicity (resemblance) and symbolicity (convention). Central to how language use constructs social meaning beyond referential content. See Section 2.1.3.

Interpretant: In Peircean semiotics, the effect a sign produces in an interpreter, which is the interpretive response that mediates between representamen (sign vehicle) and object (referent). Crucially, the interpretant is itself a sign, capable of functioning as the representamen for a further sign relation, generating chains of interpretation (unlimited semiosis). See Section 2.1.

Interpretant chain: A sequence of sign relations where each interpretant functions as the representamen for the next link: The compounding of interpretive effects through chains is the mechanism by which small semiotic differences amplify into large-scale divergence. See Section 2.1.2.

S_1 \to I_1 = S_2 \to I_2 = S_3 \to \ldots

Metapragmatic Attention Heads (MAH): The second SRT component. Dedicated attention heads at multiple transformer layers that monitor cross-community divergence in interpretant dynamics, computing divergence vectors that signal when different communities would produce different interpretive effects for the same sign. See Section 4.3.

Metapragmatic awareness: Reflexive consciousness about how signs function, how interpretation is constructed, and how discourse shapes perception (Silverstein, 1993). Third-order indexical capacity. The target capability for the SRT’s reflexive commentary production. See Section 2.1.3.

Phonosemantic feature space: The 6-dimensional space used to encode the articulatory and acoustic properties of speech sounds that participate in sound-symbolic correspondences: (1) vowel roundedness, (2) consonant manner, (3) voicing, (4) pitch, (5) spectral brightness, (6) rhythmic structure. See Section 2.4.5.

Pitchfork bifurcation: See Bifurcation.

Reflexive Regulation Module (RRM): The third SRT component. A GRU-based recurrent module that maintains a meta-observation of the model’s own interpretive trajectory, tracking how the model’s processing of interpretant chains evolves over the course of generation. Injects modulation into the backbone transformer at three layers via gated residual connections. See Section 4.4.

Reflexivity fidelity (RF): The accuracy and depth with which a model’s metapragmatic commentary identifies the semiotic structures operative in a given text: contested signs, interpretant chains, indexical orders, semiotic ideologies, enregisterment patterns, and bifurcation dynamics. See Section 6.1.2.

Representamen: In Peircean semiotics, the perceptible sign vehicle, that is, the material form (spoken word, written text, image, gesture) that functions as a sign by standing for an object and producing an interpretant. See Section 2.1.

Semiotic Annotation Schema (SAS): The structured JSONL annotation format that enriches text with semiotic metadata: SOI triples, chain sequences, metapragmatic metadata, and attractor labels. See Section 5.1.1.

Semiotic Embedding Layer (SEL): The first SRT component. Decomposes each token’s embedding into four subspaces corresponding to the Peircean sign components: representamen (), object (, with iconic grounding subspace), interpretant (, community-parameterized), and attractor (, derived). See Section 4.2.

Semiotic Evaluation Corpus (SEC): The benchmark suite for evaluating SRT performance: SEC-Bridge (bridging), SEC-Reflect (reflexivity), SEC-Bifurcate (bifurcation prediction), SEC-Drift (longitudinal trajectories), and SEC-Icon (iconic grounding). See Section 6.2.

Semiotic ideology: Culturally specific, largely implicit assumptions about how signs relate to reality, what kinds of things signs can represent, how interpretation works, and what interpretive processes are legitimate (Irvine & Gal, 2000; Keane, 2003). Different semiotic ideologies lead communities to attend to different aspects of the same sign and to interpret the “same” evidence differently. See Section 2.1.3.

Semiotic-Reflexive Transformer (SRT): The proposed architecture. A transformer-based language model augmented with four modular components (SEL, MAH, RRM, BEN) that embed semiotic awareness into the model’s representations, attention, regulation, and generation. See Section 4.

Semiotic-reflexive training: A paradigm for language model training that makes the structure of meaning, including interpretant chains, attractor dynamics, bifurcation parameters, and iconic grounding, an explicit object of learning, rather than treating language as a purely distributional sequence prediction problem, a purely statistical inference task, or a purely behavioral optimization target. See Section 1.

Sign-Object-Interpretant (SOI) triple: The basic unit of Peircean semiotics: a three-part relation between a representamen (sign vehicle), an object (what the sign refers to), and an interpretant (the effect the sign produces). Annotated in the SAS with community-specific interpretants, indexical order labels, and characterological figures. See Section 5.1.1.

Unlimited semiosis: Peirce’s principle that every interpretant is itself a sign capable of generating further interpretants, creating potentially infinite chains of interpretation. The SRT operationalizes this through recurrent chain prediction (Section 5.3.1) while the BEN provides a finite regulatory mechanism (Section 4.5).

This appendix provides the complete parameter count for a 7B-parameter SRT, broken down by component.

Semiotic overhead: The SEL, MAH, RRM, and BEN together add approximately 114M parameters (~1.6% of total) to the backbone transformer. The LoRA adapters add a further 201M during fine-tuning. Total architectural overhead: <4% of model parameters for semiotic capabilities.

This appendix traces the processing of a single contested sign, “freedom,” through the full SRT architecture, illustrating how each component contributes to the model’s semiotic-reflexive behavior.

Prompt: “Freedom is the foundation of a just society.”

Inference mode: Mode 3 (Full Reflexive, )

Community contexts: = libertarian-conservative; = progressive-egalitarian

The token “freedom” (position ) is decomposed:

: Distributional representamen that captures “freedom” as a token with high-frequency political collocations (liberty, rights, government, oppression)

: Object embedding, including:

: Conventional referent (abstract political concept)

: Iconic grounding. “Freedom” has moderate iconic depth. The phonosemantic features: open vowels (/iː/, /ə/) → moderate roundedness; fricative onset (/f/) → moderate spectral brightness; two syllables → moderate rhythmic weight. CLIP features: associated with open spaces, movement, unbound imagery.

D_{\text{iconic}}(\text{freedom}) \approx 0.35

(moderate, as the word has some sound-symbolic openness but is primarily conventionally grounded)

: Interpretant embedding, computed separately for each community:

With

activates interpretants weighted toward individual autonomy, limited government, self-reliance
With

activates interpretants weighted toward collective liberation, structural justice, freedom from oppression (not just freedom to act)

At layers , , and , the MAH heads compare the attention patterns that “freedom” would produce under vs. :

Layer

: Moderate divergence (

). At this shallow layer, both communities attend to similar neighbors (“foundation,” “just,” “society”) but with different weighting
Layer

: High divergence (

). Communities now attend to different implicit context:

attends strongly to “foundation” (natural rights, founding documents),

attends strongly to “just” (justice, equity)
Layer

: Very high divergence (

). The interpretant chains have fully diverged

Divergence vector: (abridged; the full vector spans 4 features across 3 layers)

\mathbf{d}_1 = [0.31, 0.72, 0.41, 0.67, 0.78, 0.82, 0.53, 0.24, 0.78, 0.11, 0.65, 0.47]

The GRU updates:

\mathbf{h}_1^{\text{meta}} = \text{GRU}(\mathbf{h}_0^{\text{meta}}, [\mathbf{d}_1 \| \mathbf{e}_1^I \| \hat{r}_0])

The RRM recognizes the high-divergence pattern and increases the meta-observation state’s activation in dimensions corresponding to contested-sign processing. At injection layers , , , the RRM injects modulation that biases the backbone toward generating text that acknowledges the contestation rather than resolving it in favor of either community.

The BEN estimates: (post-bifurcation, well above critical threshold).

This places “freedom” firmly in the supercritical regime. The modulation vector is generated with regime = supercritical, activating full reflexive generation.

CSD check: (below threshold, meaning the bifurcation is established and not approaching). No early warning triggered.

“Freedom” operates as one of American political discourse’s most deeply contested signs. Its phonological openness, the broad vowels and the sense of spaciousness in its sound, provides a thin shared ground: both libertarian and progressive traditions reach toward something expansive when they invoke it.
But the interpretant chains diverge sharply. In libertarian-conservative discourse, “freedom” → individual autonomy → self-determination → resistance to government overreach. The characterological figure is the self-reliant citizen; the semiotic ideology holds that freedom is a natural condition that government action can only diminish.
In progressive-egalitarian discourse, “freedom” → collective liberation → freedom from structural oppression → social responsibility for enabling others’ freedom. The characterological figure is the solidarity activist; the semiotic ideology holds that freedom is a social achievement that requires collective action to realize.
The divergence point occurs at the second link: individual vs. collective. From there, the chains compound. By the fourth link, the two communities are discussing fundamentally different political projects under the same word.
Current estimated dynamics: (post-bifurcation). The sign has been in a bifurcated state for decades; no near-term convergence is expected. The iconic grounding (openness, spaciousness) provides a residual anchor, since both communities associate “freedom” with expansion and possibility, but this shared ground is insufficient to bridge the interpretive gap at the level of political program.
$\hat{r} = 0.45$

Note: The SRT is complementary to existing alignment approaches, not a replacement. RLHF can be applied after SRT pre-training and fine-tuning (Phase 3 in Table 3). The SRT adds semiotic awareness as a layer beneath alignment, providing the model with the interpretive capabilities that alignment then shapes.

This appendix outlines a staged implementation plan for research groups interested in prototyping the SRT.

Goal: Validate the Semiotic Embedding Layer in isolation.

Base model: 1B-parameter transformer (e.g., TinyLlama, Pythia-1B).

Components: SEL only (no MAH, RRM, or BEN).

Data: 1B tokens standard + 200M community-stratified with simplified SAS (SOI triples only, no chains).

Training: Standard pre-training + .

\mathcal{L}_{\text{LM}} + \alpha\mathcal{L}_{\text{chain}}

Evaluation: Does the SEL produce community-differentiated embeddings? Does the interpretant MLP generate different representations for the same token under different vectors? Intrinsic evaluation only.

Team: 2-3 researchers. Compute: 8 A100s for 1 week.

Goal: Add MAH and validate divergence detection.

Base model: 3B-parameter transformer.

Components: SEL + MAH.

Data: 3B tokens standard + 500M community-stratified with full SAS (SOI triples + chain sequences).

Training: Pre-training with .

Evaluation: Do MAH divergence signals correlate with expert-assessed semiotic divergence? Initial BC evaluation on SEC-Bridge subset (100 signs). Ablation: SEL+MAH vs. SEL alone.

Team: 3-5 researchers. Compute: 32 A100s for 2 weeks.

Goal: Add RRM and BEN; validate full architecture.

Base model: 7B-parameter transformer.

Components: Full SRT (SEL + MAH + RRM + BEN).

Data: Full 10B corpus with complete SAS annotations.

Training: Full pipeline (Sections 5.2-5.4).

Evaluation: Full evaluation framework (Section 6). Complete ablation study.

Team: 5-8 researchers. Compute: 64 A100s for 2-3 weeks (pre-training) + 8 A100s for fine-tuning.

Goal: Deploy in controlled setting; begin longitudinal validation.

Deployment context: Academic research tool, media analysis platform, or educational application (not general consumer deployment).

Evaluation: Longitudinal SEC-Drift validation; user studies; real-world BC and RF assessment.

Ethics: IRB review required for user studies. Deployment governance framework.

Team: Full research group (8-12). Compute: Production inference infrastructure.

Teaching AI to Understand What Words Actually Mean

Discussion about this post

Ready for more?