The Case Against Vector Databases

Learn why vector databases are not the silver bullet in AI and whether you should use one in your project.

Key takeaways:

There's an overabundance of vector databases, with over 20 available on the market. As an example, LangChain offers 60 different vector store options to pick from!
Vector databases come with hidden costs.
Vector search is an optimization - premature optimization is "the root of all evil"
Keyword search cannot be replaced with vector search - they are different capabilities.
Vector search is inherently limited compared to generative LLMs
AI agents don't rely on vector search as much, e.g. AutoGPT ditched the use of vector databases

What's all the fuss about?

Vector search is not a new concept. Facebook released Faiss, a library for similarity search, back in 2017.

After the meteoric rise of ChatGPT, investors' attention turned to vector databases - supposedly "picks and shovels" for AI.

Lots of sponsored content blurs the image for people new to AI. Companies hire evangelists or developer advocates to promote the use of vector databases, but they don't have incentives to explain when to use one, or whether you actually need one at all.

Is introducing an additional separate database worth the trouble?
How does keyword search compare to vector search?
How efficient is brute-force vector search using Numpy's np.dot?
Which implementation will provide the most value to the users?

These and many other questions that engineers need to answer before deciding on the implementation. Watch the slides to better understand what the main caveats are.

About the author

Dariusz Semba profile photo

Dariusz Semba

Dariusz is a seasoned AI engineer and entrepreneur. He specializes in LLMs, information retrieval, and search. Currently building brewedby.ai, which utilizes agentic AI to curate personal daily digests.