RAG for PDFs: Why You Don't Need a Vector Database - eno PDF Reader Blog

what is RAG?

RAG stands for Retrieval Augmented Generation and the term was coined early in this AI cycle as a technique to reduce hallucinations in LLM outputs.

LLMs take a sequence of words as input and generate an output by predicting what comes next. For instance, given the input “The best pet is a ”, the LLM might generate the output “The best pet is a cat”.

This, of course, is a hallucination. To eliminate it, we could first retrieve some trusted information and use that to augment our input, resulting in a better generated response. The augmented input might become “Cats are lazy. Dogs love you. The best pet is a ” causing the LLM to predict “dog” as the final word.

The confusion in the term RAG centers around how the retrieval is done. In a real world scenario you have no idea what the user will ask and you have vast amounts of information that might be relevant to their query. To complicate things further, often this information is trapped in poorly structured PDFs that are difficult for a program to crawl through.

the vector database approach

One approach is to have all your PDFs broken into chunks and stored as embeddings in a vector database. A simple way to think about this is as a 2D graph where each chunk is represented by a point. The above statements about cats and dogs might each be in their own chunk. These chunks will have points very close together. A chunk with a statement about elephants might be further away. When the question is converted into a point, that point is close to the cat and dog chunks and so they are retrieved and used to augment the input. In reality, embeddings have far more than two-dimensions so they can model very sophisticated affinities between chunks.

Embeddings quickly became a popular retrieval method for LLMs and so the term RAG began to be inseparable from vector databases. It even became a noun: “you need to build a RAG for your PDFs if you want good results”.

RAG without a vector database

There are other ways to do retrieval though - and one way in particular that is much better than a vector database: make the LLM do it. Instead of chunking all my documents, running them through an embedding model and loading them into a vector database, I can ask the LLM for keywords to search for in my documents directly. In the case of the user’s input about pets, the LLM will have no problem supplying keywords like “dog, cat, iguana, (etc.)”. I can then simply run a text search for every key term it gave me and use the hits from those searches to augment my input.

If you squint, this approach is kind of like an inversion of embeddings in the vector database. The LLM’s weights contain an understanding of the English language which encodes the affinities between words like cat, dog, iguana, etc. When we ask the LLM for keywords to search, its response is functionally equivalent to a vector database look up - we just skipped the need to have an entirely separate embedding model and convert all our data into vectors.

The irony is that while we were building complicated RAG pipelines, the LLMs had the power to retrieve all along. All we had to do was ask.

what is RAG?

the vector database approach

RAG without a vector database

Stay up to date