PDF GPT Indexer - Fully Local RAG System
A fully local Retrieval-Augmented Generation (RAG) system for indexing and querying PDF documents.
No API keys required - everything runs locally on your machine!
🏗️ Architecture
This system uses:
- PDF Processing: PyMuPDF for text extraction
- Text Splitting: LangChain's RecursiveCharacterTextSplitter
- Embeddings: HuggingFace Sentence Transformers (local)
- Vector Store: FAISS for efficient similarity search
- LLM: Ollama (local LLM runner)
📋 Prerequisites
- Python 3.8 or higher
- At least 8GB RAM (16GB recommended)
- Atleast 5GB free disk space (for models and dependencies)
- Internet connection for initial setup only
🚀 Installation
Step 1: Clone the Repository
git clone <repository-url> cd pdfgptindexer-offline
Step 2: Install Python Dependencies
macOS
# Create virtual environment (recommended) python -m venv venv source venv/bin/activate # Install dependencies pip install -r requirements.txt
Linux
# Create virtual environment (recommended) python -m venv venv source venv/bin/activate # Install dependencies pip install -r requirements.txt
Windows
# Create virtual environment (recommended) # Use CMD, powershell may not be supported python -m venv venv venv\Scripts\activate # Install dependencies pip install -r requirements.txt
Note: If you encounter issues with faiss-cpu on Windows, you may need to install it separately:
Step 3: Install Ollama
Ollama is an open-source tool that allows you to set up and run large language models (LLMs) and other AI models locally on your own computer.
🕵️♂️ Your data doesn't leave your computer
macOS
Option 1: Using Homebrew (Recommended)
Option 2: Manual Installation
- Download from https://ollama.com/download
- Open the downloaded
.dmgfile - Drag Ollama to Applications folder
- Launch Ollama from Applications
Starting Ollama on macOS:
- Ollama usually auto-starts after installation
- If not running, you can start it manually:
ollama serve - Or launch it from Applications → Ollama
Linux
Installation:
curl -fsSL https://ollama.com/install.sh | shStarting Ollama:
# Start Ollama service (runs in background) ollama serve # Or run as a systemd service (if installed as root) sudo systemctl enable ollama sudo systemctl start ollama
Windows
- Download the installer from https://ollama.com/download/windows
- Run the installer (
.exefile) - Ollama will start automatically after installation
- You can verify it's running by opening:
http://localhost:11434in your browser
Starting Ollama on Windows:
- Ollama runs as a Windows service and starts automatically
- If needed, you can start it from the Start Menu → Ollama
Step 4: Download LLM Model
After Ollama is installed and running, download a model. Choose based on your needs:
For faster, smaller model (default)
For better quality (but larger in Size)
Note: First download takes 5-15 minutes depending on your internet speed and model size. Models are cached locally after download.
Step 5: Configure Embedding Model (Optional)
Edit the .env file to configure embeedding model
(This will be automatically downloaded first time when you run the indexer)
For faster, smaller model (default)
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
For better quality (but larger in Size)
EMBEDDING_MODEL=intfloat/e5-large-v2
Step 6: Verify Installation
Test Ollama:
ollama run phi3 "Hello, how are you?"📖 Usage
Step 1: Index PDF Files
Place your PDF files in a folder (e.g., ./ pdf), then run:
Options:
- First argument: Path to other PDF folder (default:
./pdf) - Second argument: Index output path (default:
faiss_index)
What happens:
- Extracts text from all PDFs in the folder
- Splits text into chunks
- Generates embeddings using local model (first run downloads the embedding model)
- Creates FAISS vector index
- Saves index to disk
Note: First run may take several minutes as it downloads the embedding model (~80MB).
Step 2: Query the Indexed Documents
Options:
- First argument: Path to index (default:
faiss_index)
🔧 Configuration
Use .env File
# .env file
OLLAMA_MODEL=phi3
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
TOP_K=3Available Configuration Options:
OLLAMA_MODEL - Ollama LLM model name
- Options:
phi3,qwen2.5,llama3.1,llama3,mistral,deepseek-r1:7b - Default:
phi3
EMBEDDING_MODEL - HuggingFace embedding model name
- Options:
sentence-transformers/all-MiniLM-L6-v2(default - fast, small, perfect for workshops)intfloat/e5-large-v2(good balance, better quality)sentence-transformers/paraphrase-multilingual-mpnet-base-v2(best quality)BAAI/bge-large-en-v1.5(best for English-only)sentence-transformers/all-mpnet-base-v2(good default)
- Default:
sentence-transformers/all-MiniLM-L6-v2
TOP_K - Number of similar documents to retrieve
- Options: Any positive integer (typically 3-10)
- Default:
3 - Higher values = more context but slower
After changing .env:
- For LLM changes: Just restart
chatbot.py - For embedding changes: Re-index required (
rm -rf faiss_index && python indexer.py)
🐛 Troubleshooting
Ollama Connection Issues
Problem: Error: ollama server not responding
Solutions:
- macOS: Check if Ollama is running:
ps aux | grep ollama. If not, start it:ollama serveor launch from Applications - Linux: Start Ollama:
ollama serveorsudo systemctl start ollama - Windows: Check if Ollama service is running in Services (services.msc)
Test Ollama connection:
curl http://localhost:11434/api/tags
Model Not Found
Problem: Error: Could not load model 'phi3'
Solution: Pull the model first:
Check available models:
Index Not Found
Problem: Error: Index not found at 'faiss_index'
Solution: Run the indexer first:
Memory Issues
Problem: Out of memory errors during indexing
Solutions:
- Reduce chunk size in
indexer.py - Use a smaller embedding model
- Close other applications
- Process PDFs in smaller batches
GPU Acceleration
If you have a GPU and want faster embeddings:
Edit indexer.py
model_kwargs={'device': 'cuda'} # Change from 'cpu' to 'cuda'
Edit chatbot.py line 24:
model_kwargs={'device': 'cuda'} # Change from 'cpu' to 'cuda'
Note: Requires CUDA-compatible GPU and PyTorch with CUDA support.
Import Errors
Problem: ModuleNotFoundError or import errors
Solution: Make sure virtual environment is activated and dependencies are installed:
# Activate virtual environment source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows # Reinstall dependencies pip install -r requirements.txt
📝 License
See LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📚 Resources
💡 Tips
- First indexing run downloads the embedding model (~80MB), subsequent runs are faster
- Larger PDFs take longer to index - be patient
- Keep your PDFs organized in folders for easier management
- The FAISS index can be reused - you don't need to re-index unless PDFs change
- Configure
TOP_Kin.envfile to control how many document chunks are retrieved (more = better context but slower) - All model configuration is done via
.envfile - no need to edit source code
Happy Searching! 🚀