Quote Visualization System
A local web-based quote visualization system with graph relationships, semantic similarity clustering, and search capabilities.
Features
- Quote Management: Create, read, update, and delete quotes with automatic embedding generation
- Graph Visualization: View relationships between quotes and people
- Semantic Similarity: Find similar quotes using NLP embeddings
- Clustering: Automatic grouping of related quotes using HDBSCAN
- Full-text Search: Search quotes by text, author, or context
- Analytics Dashboard: View statistics and cluster distributions
Tech Stack
- Database: Neo4j Community Edition (graph database)
- Backend: FastAPI + Python 3.11
- NLP: sentence-transformers (all-MiniLM-L6-v2), HDBSCAN, scikit-learn
- Frontend: React + TypeScript + Vite
- Deployment: Docker Compose
Prerequisites
- Docker and Docker Compose
- Git
Quick Start
-
Clone and navigate to the project
cd /Users/pika/Projects/quote-viz -
Create environment file
Edit
.envand set a secure password:NEO4J_PASSWORD=your-secure-password-here -
Start all services
This will start:
- Neo4j at http://localhost:7474 (browser) and bolt://localhost:7687
- Backend API at http://localhost:8000
- Frontend at http://localhost:3000
-
Initialize the database
docker compose exec backend python app/scripts/init_db.py -
Load sample quotes
docker compose exec backend python app/scripts/ingest_quotes.py -
Compute clusters
docker compose exec backend python app/scripts/compute_clusters.py -
Access the application
- Frontend: http://localhost:3000
- API Documentation: http://localhost:8000/docs
- Neo4j Browser: http://localhost:7474
Data Ingestion
Format Your Quotes
Create a JSON file in data/processed/quotes.json with this format:
[
{
"quote": "The quote text here",
"author": "Author Name",
"context": "Optional context or situation",
"source": "Optional source reference"
}
]Extract from Unstructured Notes
- Place your unstructured notes in
data/raw/ - Use Claude with filesystem MCP to extract quotes:
Read all files in data/raw/ and extract quotes in the format: {quote, author, context, source} - Review and save to
data/processed/quotes.json - Run ingestion script
Ingest Quotes
docker compose exec backend python app/scripts/ingest_quotes.pyThis will:
- Generate embeddings for each quote
- Create person nodes if they don't exist
- Store quotes in Neo4j
MCP Integration
Neo4j MCP Server
Add to your Claude Code settings (~/.claude/settings.json):
{
"mcpServers": {
"neo4j": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-neo4j"],
"env": {
"NEO4J_URI": "bolt://localhost:7687",
"NEO4J_USER": "neo4j",
"NEO4J_PASSWORD": "your-password-here"
}
}
}
}Then you can ask Claude:
- "How many quotes are in the database?"
- "Show me all quotes by Einstein"
- "What are the largest clusters?"
API Endpoints
Quotes
POST /api/quotes- Create a quoteGET /api/quotes- List quotes (with filters)GET /api/quotes/{id}- Get quote with similar quotesPUT /api/quotes/{id}- Update a quoteDELETE /api/quotes/{id}- Delete a quoteGET /api/search?q=query- Full-text searchGET /api/similar/{id}- Get similar quotes
Graph
GET /api/graph- Get full graph data
Analytics
GET /api/analytics/stats- Get statisticsPOST /api/analytics/recompute-clusters- Recompute clusters
Development
Backend Development
# Install dependencies locally (optional, for IDE) cd backend pip install -r requirements.txt # Run tests (when implemented) docker compose exec backend pytest # View logs docker compose logs -f backend
Frontend Development
For faster frontend development with hot reload:
cd frontend
npm install
npm run devUpdate frontend/src/api/client.ts to use http://localhost:8000 for the API.
Database Management
View data in Neo4j Browser (http://localhost:7474):
// View all quotes and people MATCH (q:Quote)-[:ATTRIBUTED_TO]->(p:Person) RETURN q, p LIMIT 25 // View cluster distribution MATCH (q:Quote) WHERE q.cluster_id IS NOT NULL RETURN q.cluster_id, count(q) as quote_count ORDER BY quote_count DESC // Find similar quotes MATCH (q1:Quote)-[s:SIMILAR_TO]->(q2:Quote) RETURN q1.text, q2.text, s.similarity ORDER BY s.similarity DESC LIMIT 10
Project Structure
quote-viz/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI application
│ │ ├── config.py # Configuration
│ │ ├── models/ # Pydantic models
│ │ ├── services/ # Business logic
│ │ │ ├── neo4j_service.py
│ │ │ ├── nlp_service.py
│ │ │ └── similarity.py
│ │ ├── api/ # API endpoints
│ │ └── scripts/ # CLI scripts
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── api/ # API client
│ │ ├── components/ # React components
│ │ ├── pages/ # Page components
│ │ ├── App.tsx
│ │ └── main.tsx
│ ├── Dockerfile
│ └── package.json
├── data/
│ ├── raw/ # Unstructured notes
│ └── processed/ # Extracted quotes JSON
├── docker-compose.yml
└── README.md
Troubleshooting
Neo4j won't start
- Ensure port 7474 and 7687 are not in use
- Check Docker logs:
docker compose logs neo4j - Verify NEO4J_PASSWORD is set in .env
Backend can't connect to Neo4j
- Wait for Neo4j health check to pass (30s)
- Check connection:
docker compose exec backend python -c "from app.config import get_settings; print(get_settings().neo4j_uri)"
Frontend can't reach API
- Verify VITE_API_URL in .env
- Check CORS settings in backend/app/config.py
- Ensure backend is running:
curl http://localhost:8000/health
Model download fails
- The first run downloads ~80MB model
- Check internet connection
- Model cache is persisted in Docker volume
Performance
With 500-1000 quotes:
- Embedding generation: <2s for 100 quotes
- Similarity search: <100ms
- Clustering: <5s
- Graph query: <50ms
Future Enhancements (V2)
- Voice note transcription (Whisper)
- Email/WhatsApp integration
- Browser extension for web quotes
- Topic extraction (BERTopic)
- Timeline view
- Person relationship inference
- Export to Markdown/PDF
- 3D graph visualization with Reagraph
License
MIT