Byte-Vision: AI-Powered Document Intelligence Platform
Status: Beta - Under active development
Byte-Vision is a privacy-first document intelligence platform that transforms static documents into an interactive, searchable knowledge base. Built on Elasticsearch with RAG (Retrieval-Augmented Generation) capabilities, it offers document parsing, OCR processing, and conversational AI interfacesβall running locally to ensure complete data privacy.
β¨ Key Features
- π Universal Document Processing - Parse PDFs, text files, and CSVs with built-in OCR for image-based content
- π AI-Enhanced Search - Semantic search powered by Elasticsearch and vector embeddings
- π¬ Conversational AI - Document-specific Q&A and free-form chat with local LLM integration
- π Research Management - Automatically save and organize insights from document analysis
- π Privacy-First - Runs entirely locally with no external data transmission
- π₯οΈ Intuitive Interface - Full-featured UI that simplifies complex document operations
π Quick Start
Prerequisites
Installation
For detailed setup instructions, see Installation Guide.
π Table of Contents
- Interface Tour
- Installation
- Configuration
- Usage
- Troubleshooting
- Development
- Contributing
- Roadmap
- License
- Contact
πΌοΈ Interface Tour
Document Search Screen
The main "Document Search" screen allows you to locate and analyze documents after they have been parsed and indexed in Elasticsearch.
Document Viewer
Click the "View" button to display the original parsed document.
Question and Answer Interface
Default View - History Tab
View previously saved question-answer history items for the selected document.
Question Entry Form
Enter your questions about the document using this interface.
Processing Stage
The system processes your question and searches through the document.
Results Display
View the AI-generated answers based on your document content.
Export to PDF
Export your question-answer sessions to PDF format for documentation.
Document Processing Features
Document Parsing and Chunking
Parse PDF, text, and CSV files for processing and analysis.
Parser Results
View the results of document parsing and chunking operations.
OCR Processing
Image Scan Setup
Configure OCR settings for processing scanned documents.
OCR Results
Review extracted text from image-based documents.
AI Inference Screen
Main Interface
Primary inference screen for general AI conversations.
Chat History
View previous conversations and responses.
Export Chat History
Export inference conversations to PDF format.
π¦ Installation
Prerequisites
| Component | Version | Purpose |
|---|---|---|
| Go | 1.23+ | Backend services |
| Node.js | 18+ | Frontend build system |
| Elasticsearch | 8.x | Document indexing and search |
| Wails | v2 | Desktop application framework |
System Requirements
- OS: Windows 10+, macOS 10.13+, or Linux
- RAM: 8GB minimum (16GB recommended)
- Storage: 5GB free space
- CPU: Multi-core processor recommended
Optional Dependencies
- CUDA: Enables GPU acceleration for AI models
- Docker: Containerize Elasticsearch for easier deployment
Development Setup
1. Clone and Install Dependencies
git clone https://github.com/kbrisso/byte-vision.git cd byte-vision # Install Go dependencies go mod download && go mod tidy # Install Wails CLI go install github.com/wailsapp/wails/v2/cmd/wails@latest # Install frontend dependencies cd frontend && npm install && cd ..
2. Set Up Elasticsearch
Option A: Docker (Recommended)
-p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elastic/elastic:8.11.0
Option B: Local Installation
- Download from Elasticsearch Downloads
- Extract and run:
# Windows bin\elasticsearch.bat # macOS/Linux bin/elastic
3. Install LlamaCpp
Option A: Download Pre-built Binaries (Recommended)
- Visit LlamaCpp releases
- Download for your platform:
- Windows:
llama-*-bin-win-x64.zip(CPU) orllama-*-bin-win-cuda-cu*.zip(GPU) - Linux:
llama-*-bin-ubuntu-x64.tar.gz - macOS:
brew install llama.cpp
- Windows:
- Extract to
llamacpp/directory
Option B: Build from Source
git clone https://github.com/ggerganov/llama.cpp.git temp-llama
cd temp-llama && mkdir build && cd build
cmake .. -DLLAMA_CUDA=ON # Add for GPU support
cmake --build . --config Release
cp bin/llama-cli ../llamacpp/
cd ../.. && rm -rf temp-llama
4. Download AI Models
Download example models
curl -L -o models/llama-2-7b-chat.Q4_K_M.gguf \
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf
curl -L -o models/all-MiniLM-L6-v2.gguf \
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2-gguf/resolve/main/all-MiniLM-L6-v2.gguf
5. Install xpdf-tools
Download and install xpdf-tools for PDF processing:
Option A: Download Pre-built Binaries (Recommended)
- Visit Xpdf downloads
- Download the appropriate version for your platform:
- Windows:
xpdf-tools-win-*-setup.exe - Linux:
xpdf-tools-linux-*-static.tar.gz - macOS:
xpdf-tools-mac-*-setup.dmg
- Windows:
- Extract or install to the
xpdf-tools/directory in your project root
Option B: Package Manager Installation
# macOS brew install xpdf # Ubuntu/Debian sudo apt-get install xpdf-utils # Windows (using Chocolatey) choco install xpdf-utils
6. Install Tesseract-OCR
Install Tesseract-OCR for optical character recognition:
Windows:
- Download from Tesseract releases
- Install the executable
- Add Tesseract to your system PATH:
- Add
C:\Program Files\Tesseract-OCRto your PATH environment variable - Or add custom path in
byte-vision-cfg.env:TESSERACT_PATH=C:\path\to\tesseract.exe
- Add
macOS:
Linux (Ubuntu/Debian):
sudo apt-get install tesseract-ocr
Verify Installation:
7. Configure Environment
Create byte-vision-cfg.env:
Elasticsearch Configuration
ELASTICSEARCH_URL=http://localhost:9200 ELASTICSEARCH_USERNAME=elastic ELASTICSEARCH_PASSWORD=your_password
LlamaCpp Configuration
LLAMA_CLI_PATH=./llamacpp/llama-cli LLAMA_EMBEDDING_PATH=./llamacpp/llama-embedding
Model Configuration
MODEL_PATH=./models DEFAULT_INFERENCE_MODEL=llama-2-7b-chat.Q4_K_M.gguf DEFAULT_EMBEDDING_MODEL=all-MiniLM-L6-v2.gguf
Application Settings
MAX_CHUNK_SIZE=1000 CHUNK_OVERLAP=200 LOG_LEVEL=INFO
8. Run the Application
The application will launch with hot reload enabled.
Production Build
The built application will be in the build/ directory
βοΈ Configuration
The application uses environment variables defined in byte-vision-cfg.env:
| Variable | Description | Default |
|---|---|---|
ELASTICSEARCH_URL |
Elasticsearch server URL | http://localhost:9200 |
ELASTICSEARCH_USERNAME |
Elasticsearch username | elastic |
ELASTICSEARCH_PASSWORD |
Elasticsearch password | - |
LLAMA_CLI_PATH |
Path to llama-cli executable | ./llamacpp/llama-cli |
LLAMA_EMBEDDING_PATH |
Path to llama-embedding executable | ./llamacpp/llama-embedding |
MODEL_PATH |
Directory containing AI models | ./models |
DEFAULT_INFERENCE_MODEL |
Default model for inference | - |
DEFAULT_EMBEDDING_MODEL |
Default model for embeddings | - |
MAX_CHUNK_SIZE |
Maximum text chunk size | 1000 |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
LOG_LEVEL |
Application log level | INFO |
π Usage
First-Time Setup
- Start Elasticsearch: Ensure Elasticsearch is running
- Launch Byte-Vision: Run the application
- Configure Models: Go to Settings β LlamaCpp Settings and set paths
- Test Connection: Verify Elasticsearch connection in Settings
Document Management
- Upload Documents: Use the document parser to upload and process files
- Configure Chunking: Adjust text chunking settings for optimal search
- Index Documents: Process documents for embedding and search
Document-Specific Q&A
- Select a document from the search results
- Click "Ask Questions" to open the Q&A interface
- Enter your questions and receive AI-generated answers
- View answer sources and confidence scores
- Export Q&A sessions to PDF
AI Interactions
- Ask Questions: Use the document question modal to query your documents
- Export Results: Export chat history to PDF for documentation
- Compare Responses: Use the comparison feature to evaluate different model outputs
Free-Form Chat
- Access the AI Inference screen for general conversations
- Chat with your local LLM models
- Export conversation history
- Compare different model responses
π§ Troubleshooting
Common Issues
β Elasticsearch Connection Failed
Symptoms: Cannot connect to Elasticsearch service
Solutions:
- Verify Elasticsearch is running:
curl http://localhost:9200
- Check if port 9200 is available:
- Verify configuration in
byte-vision-cfg.env - Check firewall settings
- For Docker: Ensure container is running
β LlamaCpp Model Loading Error
Symptoms: Model fails to load or produces errors
Solutions:
- Verify model file exists in
models/directory - Check model format (must be
.gguf) - Ensure sufficient RAM for model size
- Verify
LLAMA_CLI_PATHin configuration - Test LlamaCpp directly:
./llamacpp/llama-cli --model ./models/your-model.gguf --prompt "Hello"
β Frontend Build Errors
Symptoms: npm install or build failures
Solutions:
- Clear npm cache:
cd frontend rm -rf node_modules package-lock.json npm cache clean --force npm install - Check Node.js version:
node --version - Update npm:
npm install -g npm@latest
β Port Already in Use
Symptoms: Application fails to start due to port conflicts
Solutions:
- Find process using port:
# Windows netstat -ano | findstr :3000 # macOS/Linux lsof -ti:3000
- Kill process:
# Windows taskkill /PID <PID> /F # macOS/Linux kill -9 <PID>
Performance Tips
- GPU Acceleration: Install CUDA/ROCm for faster model inference
- Model Selection: Use smaller quantized models for better performance
- Memory Management: Adjust Elasticsearch heap size for large document collections
- Chunking Optimization: Tune
MAX_CHUNK_SIZEandCHUNK_OVERLAPfor your use case
Debug Mode
Enable debug logging:
Check logs in ./logs/ directory for detailed error information.
π οΈ Development
Built With
Core Technologies
- Wails - Desktop application framework
- Go - Backend services and APIs
- React - Frontend user interface
- Elasticsearch - Document indexing and search
- Llama.cpp - Local AI model inference
Frontend Stack
- React Bootstrap - UI components
- Bootstrap 5 - CSS framework
- React PDF - PDF generation and viewing
- Vite - Build tooling
Backend Libraries
Project Structure
byte-vision/
βββ π build/ # Built application files
βββ π document/ # Document storage
βββ π frontend/ # React frontend source
β βββ π src/
β βββ π public/
βββ π llamacpp/ # LlamaCpp binaries
βββ π logs/ # Application logs
βββ π models/ # AI model files (.gguf)
βββ π prompt-cache/ # Cached prompts
βββ π prompt-temp/ # Prompt templates
βββ π xpdf-tools/ # PDF processing tools
βββ π byte-vision-cfg.env # Configuration file
βββ π wails.json # Wails configuration
βββ π go.mod # Go dependencies
Logs and Debugging
- Application logs:
./logs/ - Elasticsearch logs: Check Elasticsearch installation directory
- Debug mode:
wails dev -debug - Frontend logs: Browser developer console
- Backend logs: Terminal output during development
π€ Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also open an issue with the tag "enhancement." Remember to give the project a star! Thanks again!
How to Contribute
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Development Guidelines
- Follow Go formatting standards (
go fmt) - Write tests for new features
- Update documentation for API changes
- Use semantic commit messages
- Ensure all tests pass before submitting
π Roadmap
In Progress
- Settings persistence for llama-cli configuration
- Settings persistence for llama-embedding configuration
- Enhanced documentation and examples
Planned Features
- Additional document format support (DOCX, PPT, etc.)
- Advanced search filters and operators
- Batch document processing capabilities
- RESTful API for external integrations
- Docker deployment configuration
- User authentication and access control
- Cloud storage integration (S3, Google Drive, etc.)
- Multi-language support
- Advanced analytics and reporting
Long-term Vision
- Distributed processing for large document collections
- Plugin architecture for custom processors
- Integration with external AI services
- Mobile application companion
See open issues for detailed feature requests and bug reports.
π License
This project is licensed under the terms of the MIT license.
π§ Contact
Kevin Brisson - LinkedIn - kbrisso@gmail.com
Project Link: https://github.com/kbrisso/byte-vision
β Star this project if you find it helpful!















