7x speed improvement for LLaMA in less than 10 lines of code github.com 2 points by hack_ml 2 years ago · 1 comment Reader PiP Save brucethemoose2 2 years ago Is that 5s per token?