Settings

Theme

Always get the best LLM performance for your $?

3 points by romain_batlle 6 months ago · 0 comments · 1 min read

Reader

Hey, I built an inference router that literally makes provider of LLM compete in real-time on speed, latency, price to serve each call. So it works on open and closed model, and for closed model price is fixed so provider only “compete” on speed and latency.

Spent quite some time normalizing APIs, handling tool-calls, and managing prompt caching, but the end result sounds very cool: You always get the absolute best value for your \$ at the exact moment of inference.

Currently runs perfectly on a Roo and Cline fork, and on any OpenAI compatible BYOK app (so kind of everywhere)

Feedback very much welcomed! Please tear it apart: [https://makehub.ai](https://makehub.ai/)

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection