High performance client for Baseten.co

7 points by mich5632 8 months ago · 1 comment

Reader

We wrote a rust py03 client for OpenAI embeddings compatible servers (openai.com, or infinity, TEI, vllm, sglang). Most server-side ML infrastructure auto-scales based on the workload. On embedding workloads, this is no longer the bottleneck and has shifted to the client. In Python, the client is blocked by the global interpreter lock. With the performance package, we release the gil during requests, so you have available resources to query your VectorDB again.

Settings

High performance client for Baseten.co

Keyboard Shortcuts