Show HN: Saving Money Deploying Open Source AI at Scale with Kubernetes

3 points by jpmcb 2 years ago · 2 comments · 1 min read

Reader

Hi HN: I wanted to share this piece I wrote on how I saved our small startup 10s of thousands of dollars every month by lifting and shifting or AI data-pipelines from using OpenAI's API to a vLLM deployment ontop of Kubernetes running on a few nodes with T4 GPUs.

I haven't seen alot on the "AI-DevOps" or infrastructure side of actually running an at-scale AI service. Many of the AI inference engines that offer an OpenAI compatible API (like vLLM, llama.cpp, etc.) make it very approachable and cost effective. Today, this vLLM AI service handles all of our batching micro-services which scrape for content to generate text on over 40,000+ repos on GitHub.

I'm happy to answer any / all questions you might have!

brianllamar 2 years ago

This was a good read. Seeing the story of AI infrastructure is a breath of fresh air. Too much witchcraft and hand waving in the AI space at the moment.

jpmcbOP 2 years ago

> Too much witchcraft and hand waving in the AI space at the moment.
Yeah +1: I found most frameworks (like langchain, llamaindex) to be abit too magical for my taste where as the well understood and well structured OpenAI API makes for building ontop of inference much easier. Things are moving really, really fast, but I'm excited for where they're headed.

Settings

Show HN: Saving Money Deploying Open Source AI at Scale with Kubernetes

Keyboard Shortcuts