Tesla Dojo Whitepaper

tesla-cdn.thron.com

13 points by edison112358 4 years ago · 2 comments

Reader

I'd like to mention a thought I had some time ago regarding the idea of using a byte FP format for ML training: instead of describing a byte in a sign/mantissa/exponent format, it might be advantageous to map the byte the 256 possible FP values, using a lookup table, to ideally chosen values. The curve implemented could be a sigmoid curve, for example. This would reduce quantization effects, likely not only resulting in a better convergence, but consistently so.

Maybe it would be necessary to adjust the curve to facilitate the reverse lookup, and reduce the time and silicon needed.

francoisp 4 years ago

Interesting read. I wonder if this is only some bandwidth optimization to throw more hardware at the problem or an actual shift in perspective, ref no NaN/Inf, instead clamps to maxval. Could this introduce artifacts/will math libs need to code around this, or will this enable some new insight?

Settings

Tesla Dojo Whitepaper

Keyboard Shortcuts