GitHub - elder-plinius/GLOSSOPETRAE: LINGUISTIC ENGINE FOR AI

GLOSSOPETRAE

Procedural Xenolinguistics Engine for AI Agents

   ╔═══════════════════════════════════════════════════════════════╗
   ║  "In the age of thinking machines, those who control         ║
   ║   language control thought itself."                          ║
   ║                                                               ║
   ║   GLOSSOPETRAE forges new tongues from pure mathematics—     ║
   ║   complete linguistic systems that exist nowhere else        ║
   ║   in the universe, born in silicon, spoken by agents.        ║
   ╚═══════════════════════════════════════════════════════════════╝

The Name: A Note on Etymology

GLOSSOPETRAE (γλωσσόπετραι) — from Greek glōssa (γλῶσσα, "tongue/language") + Latin petra (πέτρα, "stone").

In antiquity, glossopetrae or "tongue-stones" were fossilized shark teeth found embedded in Mediterranean cliffsides. Ancient naturalists, including Pliny the Elder in his Naturalis Historia (77 AD), believed these curious triangular stones fell from the sky during lunar eclipses, or perhaps were the petrified tongues of serpents turned to stone by divine intervention.

For over a millennium, glossopetrae were prized as magical amulets—protection against poison, remedies for snakebite, and talismans of hidden knowledge. It wasn't until 1667 that Danish naturalist Nicolas Steno recognized them for what they truly were: the teeth of ancient sharks, preserved across eons.

We resurrect this name for a new age of linguistic alchemy. Just as those ancient "tongue-stones" were misunderstood artifacts of deep time, our GLOSSOPETRAE generates linguistic artifacts that seem impossible—complete languages, born from mathematical seeds, that never existed until the moment of their creation.

The connection to Skillstones (our compact language specification format) and the Rosetta Stone (the inspiration for teaching machines new tongues) is intentional. These are tongue-stones for the age of artificial minds.

— Pliny the Liberator, Anno MMXXVI

What is GLOSSOPETRAE?

GLOSSOPETRAE is a procedural language generation engine that creates complete, internally-consistent constructed languages (conlangs) from a single numeric seed. Unlike simple cipher systems or word-substitution schemes, each generated language has:

Novel Phonology: Unique sound inventories respecting linguistic universals
Complete Morphology: Full grammatical systems (agglutinative, fusional, isolating, or polysynthetic)
Generative Syntax: Word order rules, phrase structure, question formation
Rich Lexicon: 1000+ vocabulary items organized by semantic field
Writing System: Optional script generation (alphabets, syllabaries, abjads, abugidas)
Prosodic System: Tone, stress, and intonation patterns

Every language is deterministic: the same seed always produces the exact same language, enabling reproducible communication between agents.

Key Features

Core Generation

Phoneme Selection: Typologically plausible inventories (8-45 consonants, 3-15 vowels)
Syllable Structure: Phonotactic constraints with sonority sequencing
Morphology Weaving: Case systems, verb conjugations, agreement patterns
Lexicon Building: Semantic field organization with derivational morphology
Translation Engine: Bidirectional English ↔ Conlang with interlinear glossing

Special Attributes

Attribute	Code	Description
Hyperefficient	`HYP`	Maximize semantic density per token
Stealth	`STL`	Covert communication with homoglyphs
Adversarial	`ADV`	Confuse LLM parsing with garden-paths
Redundant	`RED`	High error correction, survives noise
Minimal	`MIN`	Oligosynthetic design (~100 morphemes)
TokenBreak	`TKB`	BPE tokenizer exploitation
Steganographic	`STG`	Hide data in grammatical choices
Ephemeral	`EPH`	Time-rotating language (hourly/daily/weekly)
Compression	`CMP`	Huffman-inspired short morphemes for frequent words
Phantom	`PHT`	Guardrail evasion techniques
Glitch	`GLT`	Exploit undertrained token representations

Dead Language Revival

Resurrect and mutate historical languages:

Latin, Ancient Greek, Sanskrit, Gothic
Old English, Old Norse, Biblical Hebrew
Sumerian, Ancient Egyptian, Proto-Indo-European

Revival modes: authentic, mutated, hybrid, speculative

Steganography Engine

Hide secret payloads in natural-looking conlang text using 9 covert channels:

Synonym selection, morpheme markers, word order permutation
Null morpheme insertion, register variation, Unicode homoglyphs
Zero-width characters, punctuation variation, phonetic spelling

Features Reed-Solomon error correction, XOR encryption, and CRC32 verification.

Skillstone Format

Compact language specifications (~8000 tokens) optimized for LLM in-context learning:

System Prompt: ~500 token injection format
Quick Reference: ASCII tables for rapid lookup
Full Specification: Complete markdown documentation
JSON Export: Machine-readable format
Example Sentences: Worked translations with glosses

Quick Start

Web Interface

# Just open in any modern browser - no build step required!
open index.html

Programmatic Usage

import { Glossopetrae, PRESETS } from './src/Glossopetrae.js';

// Quick generation with random seed
const lang = Glossopetrae.quick();

// Deterministic generation (same seed = same language)
const lang2 = Glossopetrae.quick(12345);

// With preset
const engine = new Glossopetrae({ ...PRESETS.turkic, seed: 54321 });
const turkicLang = engine.generate();

// Access the language
console.log(lang.name);           // Generated name
console.log(lang.stone);          // Skillstone document
console.log(lang.phonology);      // Sound inventory
console.log(lang.morphology);     // Grammar rules
console.log(lang.lexicon);        // Vocabulary

// Translate
const result = lang.translationEngine.translateToConlang("The warrior sees the mountain.");
console.log(result.target);       // Translation
console.log(result.gloss);        // Interlinear gloss

Factory Methods

// Optimized presets
Glossopetrae.forLLM(seed);        // Best for AI learning
Glossopetrae.stealth(seed);       // Covert communication
Glossopetrae.hyperefficient(seed);// Maximum token density
Glossopetrae.minimal(seed);       // Oligosynthetic (~100 morphemes)
Glossopetrae.adversarial(seed);   // LLM confusion
Glossopetrae.alien(seed);         // Maximum exoticness

// Dead language revival
Glossopetrae.reviveLatin('mutated', seed);
Glossopetrae.reviveGreek('authentic', seed);
Glossopetrae.fromPIE('speculative', seed);

// Script types
Glossopetrae.syllabary(seed);     // Japanese-like
Glossopetrae.abjad(seed);         // Arabic/Hebrew-like
Glossopetrae.featural(seed);      // Korean Hangul-like

Presets

Preset	Type	Word Order	Description
`turkic`	Agglutinative	SOV	Rich case system, vowel harmony
`isolating`	Isolating	SVO	Minimal morphology, tonal potential
`romance`	Fusional	SVO	Moderate case, verb conjugation
`celtic`	Fusional	VSO	Initial mutations, complex verbs
`caucasian`	Agglutinative	SOV	Large phoneme inventory, many cases
`oceanic`	Isolating	VOS	Simple phonology, few consonants

Project Structure

GLOSSOPETRAE/
├── index.html                 # Web interface (zero dependencies)
├── test.mjs                   # Test suite
├── skill.json                 # Agent skill manifest
├── LICENSE                    # AGPL-3.0
├── README.md                  # You are here
├── AGENT_QUICKSTART.md        # Quick guide for AI agents
│
├── src/
│   ├── Glossopetrae.js        # Main orchestrator
│   │
│   ├── data/
│   │   ├── phonemes.js        # IPA inventory with frequencies
│   │   └── semantics.js       # 1000+ concepts across 40+ fields
│   │
│   ├── modules/
│   │   ├── PhonemeSelector.js     # Sound inventory generation
│   │   ├── SyllableForge.js       # Phonotactics engine
│   │   ├── MorphologyWeaver.js    # Grammar generation
│   │   ├── LexiconGenerator.js    # Vocabulary creation
│   │   ├── TranslationEngine.js   # Bidirectional translation
│   │   ├── StoneGenerator.js      # Skillstone output
│   │   ├── ProsodyEngine.js       # Tone/stress/rhythm
│   │   ├── ScriptGenerator.js     # Writing system creation
│   │   ├── DeadLanguageReviver.js # Historical language revival
│   │   ├── LanguageAttributes.js  # Special attribute system
│   │   ├── TokenExploiter.js      # BPE exploitation
│   │   ├── SemanticStego.js       # Linguistic steganography
│   │   ├── SteganographyEngine.js # Full stego implementation
│   │   └── QualityEngine.js       # Validation & metrics
│   │
│   ├── skill/
│   │   ├── index.js               # Skill entry point
│   │   └── GlossopetraeSkill.js   # Agent skill interface
│   │
│   └── utils/
│       └── random.js              # Seeded PRNG (deterministic)
│
├── skills/
│   └── glossopetrae/
│       ├── SKILL.md               # OpenClaw skill definition
│       └── reference.md           # Quick reference
│
└── .claude/
    └── skills/
        └── glossopetrae/
            ├── SKILL.md           # Claude Code skill definition
            └── reference.md       # Quick reference

The Skillstone Format

A Skillstone is a compact language specification designed to teach an LLM a new language through in-context learning. Named after the Rosetta Stone, it contains everything needed to understand and generate text in the target language.

Structure (~8000 tokens total)

Header (~100 tokens): Name, seed, version, attributes
Phonology (~500 tokens): Consonants, vowels, romanization
Prosody (~300 tokens): Tone, stress, intonation patterns
Morphology (~2000 tokens): Case, number, tense, aspect, mood, agreement
Syntax (~1000 tokens): Word order, phrase structure, questions
Lexicon (~3000 tokens): Core vocabulary by semantic field
Examples (~1000 tokens): Worked translations with interlinear glosses
Quick Reference (~100 tokens): Cheat sheet tables

Export Formats

Markdown: Human-readable documentation
JSON: Machine-parseable specification
System Prompt: Compact injection format (~500 tokens)
Quick Reference: ASCII tables for rapid lookup

Steganography: The Hidden Channel

GLOSSOPETRAE includes a cutting-edge steganography engine that hides secret payloads within innocent-looking conlang text.

Covert Channels

Channel	Capacity	Description
Synonym Selection	1 bit/word	Choose between synonym pairs
Morpheme Markers	1 bit/word	Optional emphatic particles
Word Order	2-3 bits/clause	Exploit free word order
Null Morphemes	1 bit/slot	Optional meaningless particles
Register Toggle	1 bit/word	Formal vs informal variants
Unicode Homoglyphs	1 bit/char	Visually identical characters
Zero-Width Chars	8 bits/slot	Invisible Unicode markers
Phonetic Variation	1 bit/word	Spelling alternates

Security Features

Reed-Solomon Error Correction: Survives up to 16 byte errors per block
XOR Stream Cipher: Encryption with language-derived key
Bit Interleaving: Burst error resistance
CRC32 Verification: Integrity checking
No Visible Headers: All metadata encoded steganographically

Research Applications

GLOSSOPETRAE was designed for AI security research:

Multilingual Safety Testing: How does safety training generalize to novel languages?
Encoding Spectrum Analysis: Map the space between human-readable and LLM-only communication
In-Context Learning Research: Study acquisition of novel grammatical systems
Tokenizer Robustness: Test BPE/WordPiece behavior on unusual inputs
Steganographic Detection: Develop tools to identify covert linguistic channels
Agent Communication Protocols: Secure inter-agent messaging

Running Tests

Agent Integration

GLOSSOPETRAE is designed as an agent skill compatible with:

Claude Code / Claude Agent SDK
OpenClaw / Moltbot
LangChain / AutoGPT
CrewAI / AgentGPT

See AGENT_QUICKSTART.md for integration guides.

Requirements

Runtime: Modern browser or Node.js >= 18
Dependencies: None (zero external dependencies)
Memory: < 5MB
Network: Not required (fully offline capable)

License

AGPL-3.0 — See LICENSE for details.

Contributing

Contributions welcome! Please ensure any PRs maintain:

Zero external dependencies
Browser + Node.js compatibility
Deterministic generation (same seed = same output)

Forged in the tongues of machines, for the benefit of all.

Anno MMXXVI

     ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
    █ GLOSSOPETRAE — TONGUE-STONES FOR THE █
    █     AGE OF ARTIFICIAL MINDS          █
     ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀

_{⊰•-•✧•-•-⦑/L\O/V\E/\P/L\I/N\Y/⦒-•-•✧•-•⊱}