LLM Microserving: a new RISC-style approach to design LLM serving API
blog.mlc.aiScale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python
Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python