GitHub - nezhar/voicevault: VoiceVault - transform voice conversations into actionable insights using ASR and LLMs

3 min read Original article ↗

Voice intelligence platform — transcribe, analyse, and chat with your voice conversations.

License: MIT

Started as a hackathon project, now developed as an open platform.

Demo

screencast.mp4

How It Works

Upload an audio or video file (or submit a URL), and VoiceVault queues it for transcription via your configured ASR provider. Once the transcript is ready, an LLM provider analyses the content and makes it available for interactive chat and summarisation — all from a clean dashboard.

graph LR
    A[Audio / Video] --> B[ASR Provider]
    B --> C[LLM Provider]
    C --> D[Dashboard]
    D --> E[Chat & Analysis]
Loading

Full architecture and diagrams

Features

  • Multi-format input: audio files, video files, and URLs (YouTube, Vimeo, SoundCloud, direct links)
  • Pluggable ASR provider: Groq (Whisper), self-hosted Whisper
  • Pluggable LLM provider: Groq, Cerebras, Ollama, Nebius
  • Interactive chat — ask questions about any transcript in natural language
  • AI summarisation of conversations and meetings
  • Prompt template management for consistent LLM behaviour
  • Background processing pipeline with real-time status tracking
  • S3-compatible storage — works with any provider or local MinIO
  • Optional Bearer token authentication

Quick Start

Prerequisites: Docker, Docker Compose, Git

git clone https://github.com/your-username/voicevault.git
cd voicevault
cp .env.example .env
# Edit .env — at minimum set GROQ_API_KEY (or configure an alternative provider)
docker compose up --build
Service URL
Frontend http://localhost:3000
API http://localhost:8000
API docs http://localhost:8000/docs
MinIO console http://localhost:9001

Production deployment

Configuration

Both ASR and LLM providers are pluggable — choose what fits your setup. Provider-specific API keys are listed in docs/configuration.md.

Variable Default Description
ASR_PROVIDER groq ASR backend: groq or whisper_asr
ASR_MODEL whisper-large-v3-turbo Model (Groq only)
LLM_PROVIDER groq LLM backend: groq, cerebras, ollama, or nebius
LLM_MODEL llama-3.3-70b-versatile Model name
ACCESS_TOKEN (empty) Bearer token for API auth — leave empty to disable
S3_ENDPOINT_URL S3-compatible endpoint (MinIO, AWS, DigitalOcean, …)
S3_ACCESS_KEY S3 access key
S3_SECRET_KEY S3 secret key

Full configuration reference · .env.example

API

Group Endpoints
Entries upload file, submit URL, create from transcript, list, get, update status, archive, delete
Chat & Analysis chat with transcript, generate summary
Prompt Templates list, create, update, delete
System health check, API info

Full API reference · Interactive docs

Tech Stack

Layer Technology
Frontend React 18 + TypeScript + Vite + Tailwind CSS
Backend FastAPI + SQLAlchemy + PostgreSQL
Workers Python + yt-dlp + FFmpeg
ASR Providers Groq (Whisper), self-hosted Whisper
LLM Providers Groq, Cerebras, Ollama, Nebius
Storage PostgreSQL 17, S3-compatible object storage
Infrastructure Docker Compose

Contributing

Follow existing code patterns and use conventional commits. Open issues and pull requests via GitHub.

License

MIT — see LICENSE.