Published December 10, 2025 | Version v1
Preprint Open
Description
I present a quantisation method for transformer-based language models that constrains weights to balanced ternary values {-1, 0, +1}, eliminating floating-point matrix multiplication entirely. Derived from Brusentsov's balanced ternary research at Moscow State University (1958-1965), this approach replaces multiply-accumulate operations with addition, subtraction, and skip operations.
Key results:
- 93.8% reduction in energy consumption per inference
- 16x memory compression (28GB → 1.75GB for 7B parameters)
- 48x theoretical throughput improvement
- 87-92% signal preservation
- Architectural epistemic uncertainty enabling 50% abstention on uncertain inputs (hallucination prevention)
The method requires no specialised hardware. Standard CPUs can execute efficiently.
Full implementation open-sourced at: https://github.com/Zaneham/Ternary_inference
Files
Balanced_Ternary_Transformers_ZaneH.pdf
Files (226.1 kB)