LLM Stack for 2024 – Initial Survey

3 points by yujian 2 years ago · 5 comments

Reader

I think one of the biggest struggles small startups and practitioners are facing is lack of a good option between "I wonder if this works" and "ready for prime time." Running locally is an option with consumer hardware but is cost prohibitive for a team. Cloud providers are full of complications and hidden costs. Tools like Friendli and Bento are good but ambiguous on costs and get difficult to price end-to-end once you need the full stack of options. Hugging Face inference endpoints and other tools still seem like the best option around along with cloud DBs like Zilliz.

That said, it's no wonder people just pay extra for the simplicity of a slightly smarter endpoint like OpenAI. Sure, over time the costs are insane and you lack any flexibility to create a truly targeted solution, but it feels like an all-in-one easy fix.

yujianOP 2 years ago

Hi everyone, I put together this survey of tools for the LLM Stack in 2024. I've linked the friend-link for the Medium article in the URL. I'd love feedback from you guys about any tools I've missed.

If you're a Medium member and want to support my writing, feel free to use the regular link - https://medium.com/plain-simple-software/the-llm-app-stack-2...

cybereporter 2 years ago

This is great! Out of curiosity, what's the difference between choosing a dedicated vector database vs. a traditional database with vector indices (e.g. pgvector with postgres?

yujianOP 2 years ago

oh yeah this is a great question, I get this a lot when I do my talks about RAG stuff
the way I see it is if you have a small amount of data (<10,000 vectors) then it's all the same and you should stick with the technology you are most familiar with
once you get more than that, you may want to consider a vector database
the reason that vector databases exist is because vector search is a highly compute intensive task, in regular database settings, you almost never have to run compute, the database is primarily looking to do an exact match
however, because vector search is predicated on the idea of finding similar vectors, and because exact vector matches are unlikely, you find yourself in the situation of having to optimize that
if you're building on a sql/nosql database you find yourself having to manage indexing, computing distance metrics, and load balancing
pgvector manages much of that for you, but due to the structure of SQL, it doesn't manage it in a very efficient manner - because it wasn't built to, an extra system needs to be built on top
as many experienced software engineers will tell you, adding complexity doesn't necessarily make something better, and adds more points of failure
purpose built vector databases like the ones in the article (eg milvus, chroma, weaviate) are built with this compute challenge in mind, and this becomes useful as the amount of data you have expands
- stevekaram 2 years ago
  
  I'd also add that a huge use for LLMs and vectors in the enterprise is to build queries against production data. Keeping the vector DB external to your RDBMS or other production data store is a unique chance to amplify performance without excess latching and other performance hits against the same database you count on for day to day business. Like external super smart indexes.

Settings

LLM Stack for 2024 – Initial Survey

Keyboard Shortcuts