IEEE FP8 Formats for Machine Learning (Draft) [pdf]
github.comFormats, plural.
sign:explicit leading mantissa bit:mantissa bits:exponent bits
1:0:(P-1):(8-P) where P ∈ [1,7]
P ∈ [3,5] appears to be more useful
binary8p3 -> 1:0:2:5
binary8p4 -> 1:0:3:4
binary8p5 -> 1:0:4:3
Edge cases:
P = 8 would disallow all exponent bits, leaving a sign bit and 7 explicit mantissa bits of precision
P = 0 would be unsigned and 8 exponent bits with only an implicit precision bit but no explicit mantissa bits
Also, note that:
- There is only one Zero (encoded as 0x00), no negative-zero
- There is only one NaN (encoded as 0x80)
-0, qNANs, and sNANs of the world unite to raise an exception to this travesty!