TinyChat: Large Language Model on the Edge
hanlab.mit.eduTinyChat is an efficient, lightweight, Python-native serving framework for 4-bit LLMs by AWQ. It delivers 2.3x generation speed up on RTX4090.
Code: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat