The State of Agentic Graph RAG

10 min read Original article ↗

Standard Retrieval-Augmented Generation (RAG) has become the default way to build LLM apps over private data. The recipe is familiar: chunk documents, embed them, pull the top-k nearest chunks at query time, and hand them to the model. For straightforward lookups, this works.

But the moment you ask for reasoning over complex data —legal discovery, root-cause analysis, strategic intelligence—you often hit a wall. Not because the model can’t write a good answer, but because retrieval keeps feeding it the wrong ingredients.

That wall is an epistemological problem: vector RAG confuses semantic closeness with evidential relevance.

Embeddings are a compression. They’re great at capturing “meaning-ish” similarity, but they don’t preserve the kind of structure you actually need for many real questions: dependencies, causality, “what led to what,” “what changed,” “what connects these two things.”

Vector RAG quietly assumes that semantic similarity is a good proxy for relevance. For complex tasks, it often isn’t. Three failure modes show up again and again:

Global questions don’t land. Ask, “What are the major themes across 10,000 documents?” and a similarity retriever will return a handful of chunks that sound theme-related. It has no native notion of aggregation or coverage. You get fragments where you needed a map.

Multi-hop questions miss the bridge. Ask, “What’s the connection between the director of Movie X and the founder of Company Y?” If the connecting facts live in different documents without shared surface cues, similarity search can fail to surface the glue. You can patch this with iterative “retrieve → read → refine query → retrieve,” but it’s slow, expensive, and tends to wander.

Logic and direction get flattened. The problem isn’t that embeddings are “bad.” It’s that top-k similarity retrieval doesn’t care much about asymmetric facts and constraints. “A acquired B” and “B acquired A” are not the same statement; similarity search isn’t designed to treat them as opposites.

If your workload is mostly “find me the passage that mentions X,” vector RAG is fine. The trouble starts when your queries are “show me how X relates to Y, through Z, and justify it.”

Graph RAG tackles the missing piece: explicit relationships.

Instead of treating your corpus as a sea of chunks, you build a graph of entities (nodes) connected by relations (edges), with provenance back to the source text. Retrieval can then ask a different question than “what sounds similar?”—it can ask “what’s connected, how, and through what evidence?”

A typical Graph RAG pipeline has three parts:

  • Indexing: Chunk documents, extract entities and relations (often with an LLM), then resolve duplicates (“J. Smith,” “John Smith,” “Mr. Smith” → one node). Many systems also cluster the graph (Leiden/Louvain) into communities so you can retrieve at different zoom levels.

  • Retrieval: Pull relevant subgraphs: local neighborhoods (1–2 hops), global communities (theme-level), or hybrids that use vectors for entry points and graphs for expansion.

  • Generation: Feed the model structured context—triples, paths, community summaries, or “evidence packs” tied to sources—rather than a pile of unrelated chunks.

Done well, Graph RAG doesn’t replace semantic search. It gives you a second lever: structure.

A few papers set the tone for what people now mean when they say “Graph RAG.”

Microsoft’s “From Local to Global” GraphRAG paper is the foundational paper on GraphRAG. It made the “global questions” argument unavoidable. The core move is to treat global queries as query-focused summarization: extract an entity graph, cluster it into communities, write summaries for those communities ahead of time, then answer a question by pulling the right summaries (and batching them if needed).

HippoRAG is one of the clearest examples of “use the graph as a retrieval engine,” not just as a storage format. Instead of looping retrieval prompts until something sticks, it leans on a classic graph algorithm to move signal across the corpus in one shot. Take the “biologically inspired” framing with a grain of salt, but the underlying mechanism is genuinely clever. HippoRAG posits that the LLM is analogous to the neocortex (static semantic processing) while the knowledge graph acts as the hippocampus (dynamic associative memory and indexing).

Rather than enforcing a strict ontology, HippoRAG uses Open Information Extraction to create schemaless, free-form triples — capturing nuanced relationships without predefined constraints. The retrieval engine is Personalized PageRank (PPR): given query entities as seed nodes, PPR simulates random walks from those seeds across the graph, distributing probability mass to structurally connected nodes regardless of semantic similarity. This is powerful because PPR naturally propagates across multiple hops in a single computation. There’s no iterative “retrieve-read-retrieve-read” cycle.

LightRAG (with a widely used open-source implementation here) is the pragmatic counterweight: less ceremony, more throughput. The dual-level idea, retrieve both entity-level specifics and higher-level clusters, keeps the “local/global” spirit, but with an eye on cost and incremental updates.

The systems above still feel like pipelines: extract → index → retrieve → generate. The retriever has knobs, but it doesn’t really think.

Agentic Graph RAG is the point where retrieval stops being a single call and becomes a policy: an agent explores, keeps state, decides what to pull next, and knows when it’s time to stop. The knowledge graph is no longer just a data structure; it’s the search space.

The early blueprint is Think-on-Graph. It treats graph retrieval as iterative search: start from query entities, expand neighbors, prune aggressively, loop until you either have enough support or you realize you’re lost. Even if you wouldn’t implement it exactly today, it captured the key shift: retrieval is an action sequence with feedback, not a one-shot nearest-neighbor lookup.

Once you allow multi-step exploration, you inherit a very human problem: you can follow a plausible path that turns out to be wrong.

That’s what Plan-on-Graph tries to address. Instead of committing early, it decomposes the task into sub-objectives, keeps a working memory of what’s been explored, and adds explicit reflection that can trigger self-correction—including backtracking and trying alternative routes. Planning here isn’t “let’s make a plan” theater; it’s how you prevent the agent from mistaking momentum for progress.

The GraphSearch mascot: cute—and dangerously close to a copyright infringement

Hybrid approaches are sometimes described as “vector + graph,” but that label undersells what the interesting ones are doing. These systems are still agents. They don’t just combine two retrieval methods; they orchestrate them—deciding when to enter via text, when to expand via structure, and how to prune what they’ve gathered.

A good mental model is: vectors help the agent find footholds; graphs help the agent move with intent. The agent loops: propose a direction → retrieve evidence → judge whether it helps → adjust the next move.

That shows up clearly in a few representative designs:

  • Think-on-Graph 2.0 is a tight coupling story. The agent expands candidate graph paths, then uses those candidate paths to constrain and condition text retrieval (often by restricting the search space to chunks linked to the entities involved). The retrieved text, in turn, changes which graph branches survive the next pruning step. The key point is not “hybrid.” It’s that the agent uses each modality to discipline the other.

  • KG²RAG feels simpler, but it’s still an agentic pattern: start with seed chunks from semantic retrieval, expand outward through entity links to pull in semantically distant but structurally relevant context, then organize what remains into a coherent narrative. That last step—turning retrieved debris into an ordered “evidence story”—is part of the agent’s job, not a cosmetic add-on.

  • GraphSearch makes the agentic nature explicit by splitting retrieval into two channels—semantic and relational—and weaving them through a multi-module loop that includes evidence checking. It’s not just retrieving; it’s constantly asking, “Does what I have actually support the chain I’m building?” If not, the agent expands or reframes the search.

What’s common across these is not the presence of a graph and embeddings. It’s that the system chooses—and keeps choosing—based on what it sees.

There’s a good criticism of Graph RAG that practitioners don’t always say out loud: building the graph can be the most expensive part of the system, and the resulting structure is not always the structure your question needs.

That’s the premise of LogicRAG, from “You Don’t Need Pre-built Graphs for RAG” (paper). The move is provocative but practical: don’t build a corpus graph offline. Build a query-specific reasoning structure at inference time.

They model the question as a dependency graph over subproblems:

  • decompose the query into subquestions and infer dependencies (a DAG),

  • solve them in an order that respects those dependencies,

  • keep the process from ballooning via two kinds of pruning: compress context as you go and merge redundant subproblems.

This isn’t “anti-graph.” It’s a reminder that “graph” can mean “the structure I need right now,” not necessarily “a persistent database I paid to build last week.”

A plausible end state is mixed: persistent graphs for stable, high-value domains, and query-time reasoning graphs for everything else.

Graph RAG fixes real problems—and creates new ones. The easiest way to get burned is to treat the graph as a neutral mirror of your corpus. It isn’t. It’s a model of your corpus, built by extraction choices, resolution heuristics, and summarization steps. Those choices shape what becomes “retrievable.”

The first place systems quietly break is entity resolution. Over-merge and you contaminate the graph: one common name collapses multiple people into a single node and your retrieval becomes confidently wrong. Under-merge and the graph fragments: the same entity appears in five guises and the agent never sees the full picture. Teams often spend weeks optimizing retrieval only to discover the real bottleneck was identity.

The second failure mode is structural debt from extraction. A hallucinated sentence is a nuisance; a hallucinated edge becomes a shortcut to the wrong neighborhood. If your graph doesn’t carry provenance (which chunk produced this edge, under what rule/prompt, with what confidence), you can’t debug it—and you can’t keep the agent from building on sand.

Then there’s summary drift. Community summaries are useful as navigation, but they’re one step removed from evidence, and they age badly when the corpus changes. The practical move is to treat summaries like signposts: use them to decide where to look, then pull grounded supporting text when you’re ready to answer.

None of this kills Graph RAG. It just means the “graph” is not the easy part. It’s the part that determines whether your system fails silently or fails visibly.

What’s changed over the last 18 months isn’t that graphs suddenly became fashionable again. It’s that retrieval stopped looking like search and started looking like a reasoning process.

You can see the direction in the best recent systems: the retriever has memory, the loop has checkpoints, and “I need more evidence” is treated as a normal state rather than a bug. Hybrid setups are winning—not because “two is better than one,” but because they let agents enter messy corpora through language and then move through them using structure.

The next phase is less about inventing a new acronym and more about making these systems trustworthy under pressure: identity that doesn’t collapse, extraction that doesn’t quietly poison the graph, summaries that don’t become folklore, and agents that can admit when they don’t have enough.

Vector RAG made private text usable. Graph RAG makes relationships usable. Agentic Graph RAG is the attempt to make search itself behave like a careful investigator rather than a fast autocomplete.

Discussion about this post

Ready for more?