I bypassed text gen to let LLMs communicate via raw vectors, saving opex by 90%
github.comHey HN, I was building a multi-agent swarm and realized I was burning 90% of my latency and credits just having Agent A turn its math into English (JSON) so Agent B could turn it back into math. It felt like two supercomputers communicating via smoke signals. I thought, "Why not just send the hidden state?" I accidentally fell down a rabbit hole and built Nexus Protocol (open source). It’s a Python library that bridges the latent spaces of different models (e.g., Llama-3 to Mistral). How it works: Extracts the latent vector from the sender (skipping the decoding head). Passes it through a lightweight linear adapter (trained on semantic pairs) to align dimensions/spaces. Injects the result directly into the receiver's context embedding layer. The result: "Telepathy." The receiver "understands" the concept and acts on it without a single token being generated or read. It runs on consumer hardware using bitsandbytes 4-bit loading. It’s still early (proof-of-concept level), but the speedup is massive. I’d love for you guys to roast my implementation or tell me why this is a terrible idea. pip install nexus-protocol