LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
arxiv.orgCool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at: https://twitter.com/Tim_Dettmers/status/1559892888326049792