Settings

Theme

LoRAX: Open-Source Serving for 100s of Fine-Tuned LLMs in Production

predibase.com

8 points by magdyks 2 years ago · 3 comments

Reader

magdyksOP 2 years ago

A great framework for serving many fine-tuned llms in production by quickly swapping adapters for the same base model (eg. Llama-2-70b)

abhaym 2 years ago

Whoa this looks pretty cool. One question though: is there increased latency when you have multiple adapters on a single base model?

  • tgaddair 2 years ago

    Hey, LoRAX dev here. This was one thing we spent a lot of effort optimizing. The TL;DR is that in most cases latency will be with 80% of the baseline latency with 0 adapters with as many as 128 adapters at once under heavy request load. Check out the section Results in the blog for more details and let me know if you have any questions!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection