Airgapped Offline RAG
This project by Vincent Koc implements a Retrieval-Augmented Generation (RAG) based Question-Answering system for documents. It uses Llama 3, Mistral, and Gemini models for local inference with LlaMa c++, langchain for orchestration, chromadb for vector storage, and Streamlit for the user interface.
Table of Contents
Setup
-
Ensure Python 3.9 is installed: You can use
pyenv:pyenv install 3.9.16 pyenv local 3.9.16 pyenv rehash -
Create a virtual environment and install dependencies:
-
Download Models: Download the Llama 3 (8B) and Mistral (7B) models in GGUF format and place them in the
models/directory.TheBlokeon Hugging Face has shared the models here:The models from
unslothhave also been tested and can be found here: -
Qdrant Sentence Transformer Model: This will be downloaded automatically on the first run. If running the airgapped RAG locally, it's best to run the codebase with internet access initially to download the model.
Running the Application
Locally
Using Docker
make docker-build
make docker-run
Usage
- Upload PDF documents using the file uploader.
- Select the model you want to use (e.g., Mistral).
- Enter your question in the text input.
- Click "Generate Answer" to get a response based on the document content.
Configuration
Adjust settings in config.yaml to modify model paths, chunk sizes, and other parameters.
Features
Supported Features
- Local inference with Llama C++
- Model support for LLaMA 3 (3 and 3.1)
- Model support for Mistral
- Model support for Gemini
- Support for quantized models
- Document upload and processing
- Question-Answering system
- Sentence Transformer Model for Vector Storage
- Integration with Streamlit for UI
- Support for streaming responses
Future Features
- Integration with additional models (coming soon)
- Support non-PDF documents (coming soon)
- Support for image documents (coming soon)
- Support for visualizing embeddings (coming soon)
- Support for chat history (coming soon)
- Enhanced user interface (coming soon)
- Support for multi-modal documents (coming soon)
- Exporting and importing RAG configurations (coming soon)
- Exposing a REST API (coming soon)
- Observability (coming soon)
Contributing
Contributions are welcome! Please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the GNU General Public License v3.0 (GPLv3). See the LICENSE file for details.
This means:
- You can freely use, modify, and distribute this software.
- If you modify or extend this software, you must release your changes under the GPL.
- You must include the original copyright notice and the full text of the GPL license.
- There's no warranty for this free software.
For more information, visit GNU GPL v3.
Acknowledgments
- Thanks to TheBloke and unsloth for sharing the quantized models.
- This project uses various open-source libraries. See requirements.txt for details.

