TinyChat: Large Language Model on the Edge

2 points by enduku 2 years ago · 1 comment

Reader

endukuOP 2 years ago

TinyChat is an efficient, lightweight, Python-native serving framework for 4-bit LLMs by AWQ. It delivers 2.3x generation speed up on RTX4090.

Settings