GitHub - Poll-The-People/awesome-rag: awesome-rag: a collection of awesome thing related to Retrieval-Augmented Generation

Awesome Retrieval‑Augmented Generation (RAG)

Proudly sponsored by CustomGPT.ai • Join the Slack community

CustomGPT.ai, no-code platform for building enterprise-grade RAG applications. Citation-backed answers, no hallucinations. With SOC-2 Type II security, GDPR compliance, and support for over 1400 document formats and 92 languages.

Retrieval‑Augmented Generation (RAG) equips language models with fresh, domain‑specific knowledge by fetching external context at inference time. This list is a one‑stop catalogue of every major RAG‑related resource—tools, papers, benchmarks, tutorials, and more.

Only very short descriptions are provided when essential for clarity. PRs welcome!

Open Source Tools

CustomGPT.ai - Open-source SDK for building custom RAG applications with enterprise-grade features
TrustGraph - Open-source enterprise-grade complete AI solution stack for data sovereignty
RAGFlow - Open-source RAG engine based on deep document understanding
R2R (RAG to Riches) - Advanced AI retrieval system with production-ready features
FastRAG - Research framework for efficient retrieval augmented generation
FlashRAG - Python toolkit for RAG research with 36+ datasets and 17+ algorithms
Verba - Open-source RAG application out of the box
Kotaemon - Clean, customizable RAG UI for document-based Q&A
Cognita - Open-source RAG framework for modular applications
GraphRAG - Microsoft's approach to RAG using knowledge graphs
Nano-GraphRAG - Compact GraphRAG solution with core capabilities

LangChain — Python/JS agents & chains
LangChain4j — JVM
LlamaIndex — Data loaders & indices
Haystack — Modular pipelines
Semantic Kernel — .NET & Python
DSPy — Declarative pipelines
Guidance — Prompt DSL
Flowise — No‑code builder
reag — Reasoning Augmented Generation
Danswer — Internal Q&A search
Neum — Creation and synchronization of vector embeddings at large scale
GPTCache — Embedding‑aware cache
Mastra The TypeScript AI agent framework. Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama
Letta (MemGPT) — Stateful apps
Swiftide - Fast, streaming indexing, query, and agentic LLM applications in Rust
LangGraph — Agentic DAGs
Ragna — RAG orchestration framework
SimplyRetrieve - Lightweight chat AI platform featuring custom knowledge.

Embedding Models & Libraries

Proprietary Tools

CustomGPT.ai RAG API — Enterprise agents, hallucination free.
Pinecone - Fully managed vector database service
LangSmith - Platform for building and evaluating LLM applications
OpenAI Assistants & Retrieval
Vectara — GenAI API
Cohere RAG
AWS Knowledge Bases for Bedrock
Azure AI Search + RAG
Google Vertex AI Search & RAG
IBM watsonx.ai Retrieval
NVIDIA NeMo Retriever
Anthropic Claude Retrieval
Databricks DBRX RAG
Elastic Search Labs RAG blueprints

Vendor Examples

Amazon Kendra - Intelligent enterprise search with RAG
Amazon Bedrock Knowledge Bases
Azure AI Search
Google Vertex AI Search
LangChain × OpenAI Quickstart
LangChain × Elasticsearch Blueprint
LlamaIndex × Vespa Guide
Qdrant Hybrid Search miniCOIL
AWS Bedrock RAG Sample
Azure RAG Jumpstart
GCP Vertex RAG Agent Builder

Other Tools

LangFuse: Open-source tool for tracking LLM metrics, observability, and prompt management.
Ragas: Framework that helps evaluate RAG pipelines.
LangSmith: A platform for building production-grade LLM applications, allows you to closely monitor and evaluate your application.
Hugging Face Evaluate: Tool for computing metrics like BLEU and ROUGE to assess text quality.
Weights & Biases: Tracks experiments, logs metrics, and visualizes performance.

Vector DBs & Search Engines

Pick a vector db - GUIDE

Weaviate - Open-source vector database with GraphQL interface
Qdrant - High-performance vector similarity search engine
Milvus - Open-source vector database for scalable similarity search
Chroma - Open-source embedding database for LLM applications
Pinecone - The vector database
Elasticsearch (vector) - distributed search and analytics engine
OpenSearch - Open source distributed and RESTful search engine
Vespa - AI + Data, online
PGVector - PostgreSQL extension for vector similarity search
Redis Stack Search - Searching and querying Redis data using the Redis Query Engine
ClickHouse Vectors
Oracle AI Vector Search
TiDB Vector - semantic similarity searches across various data types
ScaNN - ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale
Lantern.dev - open-source Postgres vector database
Azure Cosmos DB: Globally distributed, multi-model database service with integrated vector search.
Couchbase: A distributed NoSQL cloud database.
LlamaIndex: Employs a straightforward in-memory vector store for rapid experimentation.
Neo4j: Graph database management system.
Redis Stack: An in-memory data structure store used as a database, cache, and message broker.
SurrealDB: A scalable multi-model database optimized for time-series data.

Research Papers and Surveys

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Original RAG paper by Patrick Lewis et al.
REALM: Retrieval-Augmented Language Model Pre-Training - Google's foundational retrieval-augmented language model
Dense Passage Retrieval for Open-Domain Question Answering - Facebook's DPR system for dense retrieval
Retrieval-Augmented Generation for Large Language Models: A Survey - Comprehensive survey covering Naive RAG, Advanced RAG, and Modular RAG
A Comprehensive Survey of Retrieval-Augmented Generation (RAG) - 2024 survey tracing RAG evolution from foundational concepts to current state
Retrieval-Augmented Generation for AI-Generated Content: A Survey - Comprehensive review of RAG techniques for AIGC scenarios
Evaluation of Retrieval-Augmented Generation: A Survey - Comprehensive overview of RAG evaluation methodologies
(2020)Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks - Lewis et al. — RAG baseline
(2020) REALM - Guu et al. — Retriever‑augmented pre‑training
(2022) Atlas - Izacard & Grave — Few‑shot RAG
(2022) RETRO - Borgeaud et al. — Large‑scale retrieval cache
(2024) Benchmarking LLMs in RAG - Chen et al.
(2024) Reliable, Adaptable & Attributable LMs with Retrieval - Dan et al.
(2024) GraphRAG - Microsoft Research
(2024) RAG‑Fusion - Meta
(2025) Look‑ahead Retrieval - OpenAI

More - RAG Research Papers Collection - Curated list from ICML, ICLR, ACL

RAG Survey 2022

A Survey on Retrieval-Augmented Text Generation

RAG Survey 2023

RAG Survey 2024

RAG Approaches and Architectures

Fusion-in-Decoder (FiD)
RETRO (Retrieval-Enhanced Transformer) - DeepMind's approach with trillions of tokens
Atlas: Few-shot Learning with Retrieval Augmented Language Models - Meta's Atlas model for few-shot learning
ColBERT: Efficient Late Interaction Retrieval - Multi-vector dense retrieval with late interaction
Cache-Augmented Generation (CAG) – Pre-loads pertinent documents into the model’s context and retains the key-value (KV) cache from earlier inferences.
Agentic RAG – “Retrieval agents” that autonomously decide how and when to retrieve information.
Corrective RAG (CRAG) – Adds a refinement step to fix or polish retrieved content before it is woven into the LLM’s answer.
Retrieval-Augmented Fine-Tuning (RAFT) – Fine-tunes language models specifically to boost both retrieval quality and generation performance.
Self-Reflective RAG – Systems that monitor their own outputs and dynamically adjust retrieval strategies based on feedback.
RAG Fusion – Blends multiple retrieval techniques to supply richer, more relevant context.
Temporal Augmented Retrieval (TAR) – Incorporates time-aware signals so retrieval favors the most temporally relevant data.
Plan-then-RAG (PlanRAG) – Creates a high-level plan first, then executes retrieval-augmented generation for complex tasks.
GraphRAG – Leverages knowledge graphs to structure context and enhance reasoning.
FLARE – Uses active, iterative retrieval to progressively improve answer quality.
Contextual Retrieval – Enriches document chunks with added context before retrieval, improving relevance from large knowledge bases.
GNN-RAG – Applies graph neural networks to retrieval for better reasoning in large-language-model workflows.

Frameworks

LangChain - Framework for building LLM applications with chaining capabilities
LlamaIndex - Framework for connecting custom data sources to LLMs
Haystack - End-to-end framework for building production-ready LLM applications
DSPy - Framework for programming language models with automatic optimization
Dify - Open-source LLM app development platform with RAG pipeline
Semantic Kernel - Microsoft's SDK for developing Generative AI applications
Flowise - Drag & drop UI to build customized LLM flows
Cognita: Open-source RAG framework for building modular and production ready applications.
Verba: Open-source application for RAG out of the box.
Mastra: Typescript framework for building AI applications.
Letta: Open source framework for building stateful LLM applications.
Swiftide: Rust framework for building modular, streaming LLM applications.
CocoIndex: ETL framework to index data for AI, such as RAG; with realtime incremental updates.

RAG Techniques and Methodologies

HyDE (Hypothetical Document Embeddings) - Uses LLMs to generate hypothetical documents for queries
FLARE (Forward-Looking Active REtrieval) - Iteratively retrieves relevant documents based on prediction confidence
Self-RAG - Trains LLMs to adaptively retrieve passages and self-critique
CRAG (Corrective Retrieval Augmented Generation) - Improves generation robustness with retrieval evaluator
RAG Techniques Repository - Curated collection of 30+ advanced RAG techniques with implementations
Design and Evaluation of RAG Solutions - Comprehensive guide following best practices
LangChain RAG Best Practices - Evaluation and comparison of different RAG architectures
RAG Triad Methodology - Context relevance, groundedness, and answer relevance framework
Agentic RAG
Corrective RAG (CRAG)
Cache‑Augmented Generation
Temporal‑Aware RAG - Binary duadic codes and their related codes with a square-root-like lower bound
Plan‑then‑RAG - A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers
RePlug — Retriever‑aware generation
RETRO — Retro‑fitted retrieval
Streaming RAG — Low latency

Multimodal RAG

Multimodal RAG with CLIP - Text-Image retrieval using CLIP
SAM-RAG - Self-adaptive multimodal RAG framework
ColPali - Efficient document retrieval with vision language models
Building Multimodal RAG Systems

Graph-based RAG

Microsoft GraphRAG - Knowledge graph approach to RAG Research: GraphRAG Paper
Knowledge Graph Integration for RAG
Neo4j GraphRAG - Building knowledge graphs for RAG

Retrieval Methods

Dense Retrieval

Sparse Retrieval

SPLADE: Sparse Lexical and Expansion Model - Neural sparse retrieval with term expansion
HNSW vs DiskANN

Hybrid Search

Hybrid Search: Combining Dense and Sparse Retrieval - Implementation guide for hybrid search systems
Dense‑Sparse‑Dense (DSD)
Advanced Reranking Techniques - Guide to implementing cross-encoder reranking.

More here: All RAG Reranking (GitHub)

Other Techniques

Prompting Strategies

RAG Prompt Engineering Guide (DAIR.AI) - Comprehensive guide to prompt engineering for RAG systems
LangChain RAG Prompt Hub - Collection of tested RAG prompt templates
Efficient Prompt Engineering for RAG - Strategies for optimizing prompts in RAG systems
Secure RAG applications using prompt engineering on Amazon Bedrock - Best practices for RAG prompts with security considerations

Zero‑Shot / Few‑Shot
Chain‑of‑Thought (CoT)
Meta Prompting
Generated Knowledge Prompting
ReAct
Reflexion
Automatic Prompt Engineer (APE)
Directional Stimulus Prompting (DSP)
Chain‑of‑Verification (CoVe)
Self‑Consistency
Prompt Compression
Dynamic / Adaptive Prompts
System → Retrieval → User triple‑prompt
GraphPrompt
Emerging RAG & Prompt Engineering Architectures for LLMs
How to Cut RAG Costs by 80% Using Prompt Compression

Chunking & Pre‑processing

11 Chunking Strategies for RAG — Simplified & Visualized - Comprehensive guide covering 11 chunking methods with visual comparisons
5 Levels of Text Splitting - Hierarchical approach to chunking from basic to advanced
Semantic Chunking with LlamaIndex - Implementation guide for semantic-based document splitting
Optimizing Retrieval-Augmented Generation with Advanced Chunking Techniques - Research on optimal chunk sizes for different use cases
CharacterTextSplitter — fixed‑size
RecursiveTextSplitter
SentenceSplitter (LlamaIndex)
Unstructured‑IO loaders
LoRA Chunking - Fused kernel chunk loss to include LoRA to reduce memory, support DeepSpeed ZeRO3
Semantic chunking video
Agentic chunking demo - The 5 Levels Of Text Splitting For Retrieval
Chunking Strategies for LLM Applications
Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex
How to Chunk Text Data — A Comparative Analysis

Comparison Guides

Vector Database Comparison: Pinecone vs Weaviate vs Chroma - Comprehensive enterprise-focused comparison with performance metrics
Top Vector Database for RAG: Qdrant vs Weaviate vs Pinecone - Performance comparison of 6 vector databases for RAG workloads

Embeddings Models

Embedding Model Comparison: OpenAI vs Cohere vs Open Source - Comprehensive evaluation of commercial and open-source embedding models
Best Embedding Model — OpenAI / Cohere / Google / E5 / BGE - Detailed comparison of top embedding models with performance metrics
Matryoshka Embeddings for RAG - Implementing variable-size embeddings for efficiency
BGE M3 and SPLADE Implementation Guide - Guide to implementing sparse and dense embeddings

Instruction Tuning & Optimization

RA‑DIT
InstructRetro
FLARE / Active RAG
UltraFeedback — RLHF on RAG
DSI‑T — Decoder‑only retrieval

Finetuning and Training

Response Quality, and Hallucination

RAGTruth: A Hallucination Corpus - Dataset with 18,000 RAG responses and hallucination annotations
Reducing Hallucination in Structured Outputs via RAG
WhyLabs AI Control Center - Platform for real-time guardrails and monitoring
Vectara Hallucination Score
Prompt‑Injection Defense
OpenAI Function Calling JSON Schema
Harmless RLHF pipelines
in-Of-Verification Reduces Hallucination in LLMs
How to Detect Hallucinations in LLMs
Measuring Hallucinations in RAG Systems

Security and Privacy Considerations

OWASP Top 10 for LLM Applications - Comprehensive security framework covering RAG vulnerabilities
CSA RAG Security Best Practices - Enterprise-grade security controls for RAG
Microsoft Presidio for PII Protection - Framework for detecting and anonymizing sensitive information
LLM Guard - Security toolkit for protecting LLM applications
Masking PII Data in RAG Pipeline
Hijacking Chatbots: Dangerous Methods Manipulating GPTs
Guardrails AI - Framework for implementing security guardrails
NVIDIA NeMo Guardrails - Comprehensive toolkit for building programmable guardrails
NeMo Guardrails: The Missing Manual
Safeguarding LLMs with Guardrails

Evaluation Metrics and Benchmarks

RAGAS (Retrieval-Augmented Generation Assessment) - Reference-free evaluation framework with component-level metrics
TruLens - Comprehensive evaluation and tracking for LLM applications
DeepEval - Open-source evaluation framework for LLMs
Arize Phoenix - Open-source observability platform
RAGBench - 100k examples across 5 industry domains
BeIR - Benchmark for zero-shot evaluation of information retrieval
MTEB - Massive Text Embedding Benchmark
ARES - Automated Evaluation of RAG Systems
RGB Benchmark - implementation for Benchmarking Large Language Models in Retrieval-Augmented Generation
LlamaIndex RAG eval - Evaluation and benchmarking are crucial in developing LLM applications

Blogs

RAG Benchmark 2023

RAG Benchmark 2024

Advantages and Disadvantages

Performance, Cost & Observability

Vector Database Optimization - Techniques for efficient vector storage and retrieval
Hybrid Retrieval Strategies - Combining multiple retrieval methods for better performance
Chunking Optimization - Strategies for optimal text segmentation
LangFuse
LangSmith
Helicone — telemetry
WandB RAG guide
OpenLLMetry - Open-source observability for your LLM application, based on OpenTelemetry
Cost optimisation tips

Cost Calculators

RAG Cost Calculator - Tool for estimating and optimizing RAG pipeline costs
RAG Savings Calculator
RAG Cost Calculator

RAG Fine-tuning

RAFT (Retrieval Augmented Fine-Tuning) - Adapting Language Model to Domain Specific RAG
Fine-tuning vs RAG Guide - Comprehensive comparison and guidance
Direct Preference Optimization (DPO) for RAG - Alternative to RLHF for aligning RAG outputs

Knowledge‑Graph / Structured RAG

DBpedia - Structured knowledge from Wikipedia
Wikidata - Community-maintained knowledge base
ConceptNet - Large-scale commonsense knowledge graph
YAGO - High-quality knowledge base
Neo4j LLM Knowledge Graph Builder
Neo4j RAG blog
GraphRAG site
NebulaGraph Graph‑RAG article

Libraries and SDKs

Sentence Transformers - Python framework for sentence, text and image embeddings
LiteLLM - Python SDK for 100+ LLM APIs in OpenAI format
AI SDK - TypeScript toolkit for building AI applications
Hugging Face Transformers - State-of-the-art ML for PyTorch, TensorFlow, and JAX

Key Concepts

Hugging Face Transformers - RAG Documentation
RAG-Survey GitHub Repository - Curated collection of RAG papers with taxonomy

Educational Content

Courses and Tutorials

MAGMaR - The 1st Workshop on Multimodal Augmented Generation via MultimodAl Retrieval - Reno Kriz, Kenton Murray, Eugene Yang, Francis Ferraro, Kate Sanders, Cameron Carpenter, Benjamin Van Durme
Towards Knowledgeable Language Models - Zoey Sha Li, Manling Li, Michael JQ Zhang, Eunsol Choi, Mor Geva, Peter Hase
Modular RAG and RAG Flow Yunfan Gao (2024) Tutorial - Blog I and Blog II
Stanford CS25: V3 I Retrieval Augmented Language Models Douwe Kiela (2023) Lecture - Video
Building RAG-based LLM Applications for Production Anyscale (2023) Tutorial - Blog
Multi-Vector Retriever for RAG on tables, text, and images LangChain (2023) Tutorial - Blog
Retrieval-based Language Models and Applications Asai et al. (2023) Tutorial ACL Website and Video
Advanced RAG Techniques: an Illustrated Overview Ivan Ilin (2023) Tutorial - Blog
Retrieval Augmented Language Modeling Melissa Dell (2023) Lecture Video

Blogs and Articles

RAG Implementation with LangChain and Weaviate - From theory to Python implementation
Advanced RAG Techniques: An Illustrated Overview
The RAGOps Stack: Critical Components
Knowledge Graphs for RAG
RAG Intuitively & Exhaustively Explained
RAG in Production: 9 Lessons
Reranking vs Embeddings on Cursor
Forget RAG, Think RAG‑Fusion
Hidden Costs of RAG

Newsletters & Forums

ragaboutit - A blog and newsletter focused specifically on RAG news, tutorials, and insights, making it a dedicated resource for staying up-to-date.
r/LangChain
r/rag - Reddit communities for practical discussions, troubleshooting, and sharing projects. These are valuable for seeing what challenges other developers are facing in real-time.

Talks and Conferences

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (NeurIPS 2020)
Self-RAG: Learning to Retrieve, Generate, and Critique (ICLR 2024)
RAG Research Papers Collection - Curated list from ICML, ICLR, ACL

Influential Researchers and Influencers

Patrick Lewis - Lead author of original RAG paper, AI Research Scientist at Cohere
Sebastian Riedel - Co-author of RAG paper, Professor at UCL and DeepMind
Douwe Kiela - Co-author of RAG paper, CEO of Contextual AI
Gautier Izacard - Author of FiD and Atlas papers, Meta AI
Kelvin Guu - Lead author of REALM paper, Google Research
Douwe Kiela — Modular RAG, Stanford
Matei Zaharia — DSPy, Databricks
Akari Asai — Dense retrieval research
Jerry Liu — LlamaIndex
Harrison Chase — LangChain
Andrej Karpathy — LLM systems
Jeff Dean — Google Research
Artem Yankov — Qdrant
Alden Do Rosario - RAG Influencer, CEO CustomGPT.ai

Latest Trends 2024-2025

RAG-as-a-Service market at $1.2B (2024)
Projected 49.1% CAGR through 2030
On-device RAG for privacy

Community Resources

r/LocalLLaMA - 493k members
r/MachineLearning - Active RAG discussions
r/RAG - Dedicated RAG subreddit

Discord

RAG TAGG Discord - 2,492 members
Vectara RAGTime Bot
RAGHub Community

GitHub Communities

RAGHub Repository
Microsoft GraphRAG

Existing Collections

Contributing

Contributions are welcome! Please read the contribution guidelines before submitting a pull request.

License

This collection is licensed under MIT.