Infinity embedding engine added to KubeAI
Just merged and released the [Infinity support PR](https://github.com/substratusai/kubeai/pull/197) in KubeAI, adding Infinity as an embedding engine. So you can get embeddings on your local Kubernetes clusters with an OpenAI compatible API.
Infinity is a high performance and low latency embeddings engine: https://github.com/michaelfeil/infinity KubeAI is a Kubernetes Operator for running OSS ML serving engines: https://github.com/substratusai/kubeai
How to use this?
Deploy on any K8s cluster by running: ``` helm repo add kubeai https://www.kubeai.org helm install kubeai kubeai/kubeai --wait --timeout 10m cat > model-values.yaml << EOF catalog: bge-embed-text-cpu: enabled: true features: ["TextEmbedding"] owner: baai url: "hf://BAAI/bge-small-en-v1.5" engine: Infinity resourceProfile: cpu:1 minReplicas: 1 EOF helm install kubeai-models kubeai/models -f ./model-values.yaml ```
Forward kubeai service to local host: ``` kubectl port-forward svc/kubeai 8000:80 ```
Afterwards you could use the OpenAI Python client to get embeddings: ``` from openai import OpenAI # Assumes port-forward of kubeai service to localhost:8000. client = OpenAI(api_key="ignored", base_url="http://localhost:8000/openai/v1") response = client.embeddings.create( input="Your text goes here.", model="bge-embed-text-cpu" ) print(response) ```
What’s next? - Support for autoscaling based on Infinity reported metrics.
No comments yet.