Reasoning for RAG: A 2025 Perspective

The seismic impact triggered by DeepSeek R1 in early 2025 has advanced our previous predictions about LLM reasoning and decision-making capabilities by over half a year. Enhancing LLM reasoning has become one of the hottest research directions in the field. This naturally raises the question: What adjustments does RAG need to make alongside the evolution of LLM reasoning? This constitutes the primary motivation for writing this blog article.

Reasoning wasn’t introduced by R1/o1 actually. LLM-based reasoning had already seen widespread application in 2024’s Agent systems — the crucial distinction lies in R1’s implementation of reasoning chains. Current mainstream Agent frameworks typically consist of four modules (Plan, Memory, Action, Tool) and several design patterns, with ReAct being the most prominent. Let’s examine the technical approaches to implementing reasoning:

Prompt Engineering for Agent Frameworks (e.g., ReAct): Even pre-R1, LLMs possessed inherent reasoning capabilities. ReAct directly leverages LLMs for Reasoning + Action cycles. The reasoning phase generates analytical steps and contextual explanations to guide subsequent actions, while the action phase produces tool invocation requests (search engines, APIs, database queries). This iterative process builds on previous observations, creating tight dependencies between steps. Function Call approaches similar challenges but requires LLM training with specific function call content. Notably, inadequate emphasis on RAG in many Agent frameworks has limited ReAct’s effectiveness, often reducing Agents to basic task planners rather than fully utilizing LLM potential. As noted in Claude’s 2024 blog [1], workflows often outperformed Agents in practical scenarios.
Model Architecture Improvements (R1’s Approach): R1 demonstrates the value of reinforcement learning in training models to synthesize and learn from CoT (Chain of Thought) reasoning trajectories. CoT enhances computational capacity — the foundation of reasoning — by decomposing complex problems into intermediate steps. Longer CoT chains allow more meticulous reasoning at each step, progressively approaching correct solutions.
Architectural Extensions via RAG and Agent Mechanisms: This approach emphasizes RAG as critical infrastructure. Unlike the first approach’s reliance on LLMs and simple tools, RAG provides a data foundation for targeted reasoning. As described in [2], reasoning essentially constitutes search in solution space. A key RAG-based implementation combines heuristic search with incremental fine-tuning. For instance, while ReAct decomposes problems into logical sub-problems, its local-information dependency often leads to suboptimal solutions. Recent works ([3], [4]) address this by integrating Monte Carlo Tree Search (MCTS) with RAG, using reward models/functions to evaluate path quality. Though computationally intensive and challenging to generalize, vertical applications are emerging — e.g., Text2SQL solutions using natural language-SQL mapping knowledge bases with custom reward rules.

The latest RAGFlow version focuses exclusively on Approach 3, aiming to develop universal solutions that enhance basic RAG with reasoning capabilities. This enables RAG systems to provide R1/o1-level reasoning on proprietary user data. However, reinforcement learning-based methods (heuristic search + incremental fine-tuning) currently fall outside RAGFlow’s 2025 Spring roadmap.

Before analyzing industrial/academic approaches, we must address a fundamental question: Does simply integrating R1/o1 with basic RAG confer reasoning capabilities? The answer is negative. While R1/o1 can generate reasoning chains using RAG-retrieved data, the thinking process remains model-centric — analogous to “user queries → search materials → thinks.” The sufficiency of reasoning based on retrieved materials remains questionable, as the reasoning model can only operate on provided materials.

Press enter or click to view image in full size

Let’s now examine and synthesize key developments.

The first notable work comes from Reference [5] (O1 Embedder), which attempts to train an embedding model with reasoning capabilities that simultaneously generates high-quality thought content and enables precise retrieval. The methodology involves training on datasets containing queries, LLM-generated reasoning content, and relevant documents. The resulting model produces both embeddings and generated text through a decoder component. While this exploration holds significance, divorcing reasoning capabilities from LLMs essentially abandons the core advantage of large language models.

Next, Search o1 (Reference [6]) explicitly aims to enhance reasoning capabilities within RAG frameworks. Its workflow features two coordinated threads:

Reasoning Chain

Initialization: Establishes problem context and requirements through task decomposition
Stepwise Generation:
_ Problem decomposition into sub-questions
_Logical deduction using internal knowledge and retrieved information
_Generation of coherent reasoning sequences
Knowledge Gap Detection: Identifies missing information requiring external retrieval
External Retrieval Trigger: Launches searches based on detected gaps
Knowledge Integration: Condenses retrieved knowledge into reasoning chain

2. Reason-in-Documents

Operates parallel to the main reasoning thread
Addresses RAG pitfalls like information redundancy and coherence disruption
Performs deep document analysis to extract context-relevant information.

While Search o1 presents a comprehensive engineering-focused academic solution, two critical unresolved challenges emerge:

Quality Assessment: No clear mechanism to evaluate reasoning quality
Termination Criteria: No defined protocol for stopping iterations

These limitations notwithstanding, the practical value of this engineering-oriented approach remains evident. Compared to naive RAG+R1 integration, Search o1’s core innovation lies in its iterative reasoning process that progressively refines questions to obtain high-quality answers. The workflow diagram (not shown here) demonstrates this iterative refinement mechanism.

Press enter or click to view image in full size

Next, we examine Microsoft’s PIKE-RAG (Reference [7]), which similarly relies on LLMs to analyze user queries and decompose tasks into sub-problems. Its distinctive feature lies in leveraging GraphRAG for refinement: conducting searches on a knowledge graph for sub-problems and aggregating multi-hop answers from their solutions. PIKE-RAG implements knowledge-aware task decomposition, ensuring generated sub-questions align with the knowledge base’s structure. When specific knowledge structures exist, the system produces atomic questions matching those patterns.

Concretely, PIKE-RAG iteratively constructs reasoning chains:

Each iteration retrieves knowledge snippets for the current sub-problem
Updates the reasoning chain with retrieved knowledge
Selects the most relevant atomic question for further decomposition

The process terminates when sufficient knowledge is accumulated or further decomposition becomes unnecessary. PIKE-RAG exemplifies engineering-focused implementations akin to Search o1.

Next, the Agentic Reasoning framework (Reference [8]) employs agent architectures for deep research, enabling LLMs to dynamically invoke external tools like humans during reasoning. Key components include three integrated agents:

MindMap Agent: Constructs dynamically updated knowledge graphs by extracting entities and logical relationships from reasoning chains. Supports reasoning by retrieving contextual information when clarifying logic or querying specifics.
Web Search Agent: Executes search requests generated during reasoning, integrating retrieved web content into the reasoning chain.
Coding Agent: Generates and executes Python code for computational tasks, returning results in natural language for chain integration. While sharing similarities with Search o1, Agentic Reasoning distinguishes itself through computational capabilities via the Coding Agent.

Finally, LevelRAG (Reference [9]) follows comparable principles with terminological variations:

High-Level Searcher: Handles strategic thinking and task decomposition
Low-Level Searcher: Executes concrete search operations The framework’s schematic diagram (not reproduced here) provides intuitive visualization, aligning conceptually with preceding architectures.

These works collectively demonstrate three evolutionary trends:

Iterative Reasoning Enhancement: Progressive refinement through cyclic decomposition-retrieval-integration processes
Hybrid Architecture Design: Tight coupling of symbolic systems (knowledge graphs) with neural approaches (LLMs)
Tool Augmentation: Strategic integration of computational, search, and analytical capabilities

While implementation details vary, all frameworks confront shared challenges in evaluation metrics design and computational efficiency optimization — critical hurdles for real-world deployment.

Press enter or click to view image in full size

The next contribution, OmniThink from Zhejiang University and Alibaba’s Tongyi Lab (Reference [10]), focuses on reasoning-powered report generation rather than Q&A systems. This framework simulates human cognitive processes to produce high-quality long-form content using RAG as its foundation. Its workflow comprises three phases:

Iterative Expansion & Reflection: Constructs an Information Tree (hierarchical knowledge representation) and Conceptual Pool (dynamic knowledge repository containing core insights extracted from the tree).
Outline Generation: Derives article structure from the conceptual pool.
Content Production: Generates full-text using semantic similarity retrieval to integrate relevant information.

Starting with a topic, OmniThink:

Retrieves initial web results (via Bing/Google) to build the information tree’s root node
Analyzes leaf nodes for expansion needs, generating child nodes representing subtopics
Refines newly retrieved information through filtering and synthesis, updating the conceptual pool Iteration continues until meeting termination criteria: sufficient information coverage or maximum search depth.

These implementations (Search o1, PIKE-RAG, Agentic Reasoning, LevelRAG, OmniThink) share common traits:

Purely engineering-driven without novel algorithmic contributions
LLM-powered iterative question/topic generation
Unresolved challenges in quality evaluation and termination protocols

RAG-Gym (Reference [11]) introduces reinforcement learning by framing QA tasks as nested Markov Decision Processes (MDPs):

Outer-layer MDP: Manages retrieval-environment interactions
Inner-layer MDP: Controls LLM token generation
Reward model: Evaluates prediction accuracy to guide agent decisions Training leverages annotated high-quality decision trajectories. Similarly, DeepRAG (Reference [12]) models RAG reasoning as MDPs but employs imitation learning rather than RL to address:

Unnecessary sub-task decomposition
Lack of retrieval decision intelligence

Both frameworks face implementation challenges due to dependency on LLM fine-tuning, representing exploratory rather than production-ready solutions.

RAGFlow 0.17 Implementation:

The open-source framework synthesizes strengths from Search o1, PIKE-RAG, Agentic Reasoning, and LevelRAG through:

Iterative Reasoning Chain Generation:
Produces sub-questions triggering search requests at each step
Multi-source Retrieval:

_Internal data repositories(simple RAG).
_Web search (via user-provided API keys) for contextual commonsense.
_Knowledge graphs enhanced with LLM-generated “anticipated questions” per text chunk.

4. Termination Logic:

_LLM-determined stopping condition
_Threshold-based fallback termination

5. GraphRAG Integration: Stores chunk-specific potential questions to guide effective query formulation

6. Final Synthesis: Aggregates complete reasoning chains for conclusive outputs

The workflow (diagram referenced but not shown) achieves comprehensive Deep Research capabilities through this engineered integration.

While academic prototypes emphasize novel architectures, industrial implementations like RAGFlow prioritize composability and operational viability, strategically balancing retrieval breadth with computational pragmatism.

Press enter or click to view image in full size

Below is the result of the inference dialog obtained by connecting DeepSeek V3 using RAGFlow, which looks very similar to the DeepSeek R1 result.

Press enter or click to view image in full size

Observations from Implementing Reasoning-Enhanced LLMs

In our practical experience with reasoning-enhanced large language models, we present the following insights:

Performance Parity with Operational Trade-offs

RAGFlow’s reasoning-powered RAG (termed Deep Research) currently demonstrates comparable effectiveness when connected to standard LLMs (e.g., V3) versus specialized reasoning LLMs like R1. However, R1’s inherently prolonged reasoning process becomes significantly extended when coupled with iterative thinking mechanisms, making direct R1 integration inadvisable for latency-sensitive scenarios.

Contextual LLM Selection

Not all RAG implementations require R1-level reasoning. We observe suboptimal deployments where users employ R1 for basic tasks like keyword extraction or knowledge graph construction. This unnecessarily inflates system latency, as these operations require minimal cognitive processing better suited to standard LLMs.

Enterprise Adoption Barriers

While R1 holds immense potential, its current lack of dedicated reasoning chain APIs prevents enterprises from:
_Generating proprietary reasoning chains using internal data
_Achieving human-like interleaved reasoning and retrieval workflows RAGFlow bridges this gap by enabling organizations to harness reasoning capabilities with their data while awaiting API availability. We actively seek community feedback comparing outcomes from standard vs. reasoning LLM integrations.

Strategic API Development

We urge DeepSeek and similar providers to prioritize reasoning chain API releases. Such interfaces would empower enterprises to combine LLMs’ analytical power with proprietary data, unlocking transformative commercial applications.

RAGFlow’s Deep Research already delivers tangible business impact across industries:

Healthcare: Diagnostic report generation via medical record analysis
Business Intelligence: Decision support through operational/financial data synthesis
Legal Compliance: Adjudicative assistance using regulatory frameworks and case law

This evolution propels RAG beyond basic Q&A systems into decision support architectures. As Claude’s 2023 analysis noted, workflows still dominate practical implementations due to underdeveloped agent capabilities. However, 2025 spring marks an inflection point where reasoning-enhanced LLMs minimize human orchestration, accelerating AI ubiquity.

The synergy of reasoning + search forms the cornerstone of enterprise AI services. RAGFlow remains committed to advancing RAG infrastructure, as evidenced by our rapid reasoning feature deployment. We invite continued collaboration:

🌟 Support our mission on GitHub: https://github.com/infiniflow/ragflow

Bibliography

https://www.anthropic.com/research/building-effective-agents
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition https://arxiv.org/abs/2502.06773
MCTS-KBQA-Monte Carlo Tree Search for Knowledge Base Question Answering https://arxiv.org/abs/2502.13428
KBQA-o1-Agentic Knowledge Base Question Answering with Monte Carlo Tree Search https://arxiv.org/abs/2501.18922
O1 Embedder-Let Retrievers Think Before Action https://arxiv.org/abs/2502.07555
Search-o1-Agentic Search-Enhanced Large Reasoning Models https://arxiv.org/abs/2501.05366
PIKE-RAG-sPecIalized KnowledgE and Rationale Augmented Generation https://arxiv.org/abs/2501.11551
Agentic Reasoning-Reasoning LLMs with Tools for the Deep Research https://arxiv.org/abs/2502.04644
LevelRAG-Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers https://arxiv.org/abs/2502.18139
OmniThink-Expanding Knowledge Boundaries in Machine Writing through Thinking https://arxiv.org/abs/2501.09751
RAG-Gym-Optimizing Reasoning and Search Agents with Process Supervision https://arxiv.org/abs/2502.13957
DeepRAG-Thinking to Retrieval Step by Step for Large Language Models https://arxiv.org/abs/2502.01142