Balanced Ternary Transformers: Eliminating Multiplication and Enabling Epistemic Uncertainty

1 min read Original article ↗

Published December 10, 2025 | Version v1

Preprint Open

  • 1. ROR icon Auckland University of Technology

Description

I present a quantisation method for transformer-based language models that constrains weights to balanced ternary values {-1, 0, +1}, eliminating floating-point matrix multiplication entirely. Derived from Brusentsov's balanced ternary research at Moscow State University (1958-1965), this approach replaces multiply-accumulate operations with addition, subtraction, and skip operations.

Key results:

  • 93.8% reduction in energy consumption per inference
  • 16x memory compression (28GB → 1.75GB for 7B parameters)
  • 48x theoretical throughput improvement
  • 87-92% signal preservation
  • Architectural epistemic uncertainty enabling 50% abstention on uncertain inputs (hallucination prevention)

The method requires no specialised hardware. Standard CPUs can execute efficiently.

Full implementation open-sourced at: https://github.com/Zaneham/Ternary_inference

Files

Balanced_Ternary_Transformers_ZaneH.pdf

Files (226.1 kB)

Additional details