Settings

Theme

LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale

arxiv.org

7 points by ofirpress 4 years ago · 1 comment

Reader

ofirpressOP 4 years ago

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection