StudyWithMiku ๐ค๐
StudyWithMiku is an AI-powered animated study assistant that reads and embeds PDFs, understands their content, and explains concepts to users using Miku or Teto voices with interactive animations. The system acts as a virtual tutor that learns from your documents and answers questions in a friendly, visual, and engaging way.
โจ Features
๐ค AI-Powered Assistant
- Multi-Provider LLM Support: Works with Ollama (local), Google Gemini, and OpenAI models
- LangGraph Agent Architecture: Intelligent tool-calling and conversation flow management
- Context-Aware Conversations: Maintains conversation history and understands references
๐ Document Processing
- PDF Embedding: Automatically processes PDFs dropped in the
content/folder - Vector Database: Uses ChromaDB for semantic search and retrieval
- RAG Pipeline: Retrieves relevant context from embedded documents to answer questions
- Background Processing: PDF embedding runs in separate terminal windows without blocking
๐ต Text-to-Speech (TTS)
- Multiple TTS Engines:
- Coqui TTS with multi-speaker support
- DiffSinger vocoder integration for anime-style voices
- Voice Options: Miku and Teto character voices
- Real-time Audio Playback: Speaks responses using sounddevice
๐ ๏ธ System Tools
- Browser Control: Open URLs in default browser
- Network Management: Check internet connectivity, enable Wi-Fi, web search via DuckDuckGo
- Process Management: Find and terminate background processes
- System Commands: Execute shell commands (date, ls, pwd, etc.)
- File Watching: Monitors
content/folder for new files
๐ Intelligent Behavior
- Automatic Context Retention: Remembers previous conversation context
- Smart Error Recovery: Handles network failures, missing files, and process errors
- Path Expansion: Automatically expands
~to user home directory - Web Search Integration: Search the web without manual internet checks
๐๏ธ Architecture
StudyWithMiku/
โโโ main.py # Main entry point with event loop
โโโ core/
โ โโโ agent.py # LangGraph agent with tool binding
โ โโโ state.py # Agent state definition
โ โโโ tools.py # Tool registry
โโโ models/
โ โโโ LLM.py # Multi-provider LLM wrapper
โ โโโ embedding.py # Embedding model configuration
โ โโโ tts.py # Text-to-speech engine
โ โโโ voice.py # Voice configuration
โโโ config/
โ โโโ database.py # ChromaDB vector store manager
โโโ tools/
โ โโโ browser/ # Browser control tools
โ โโโ embedded/ # PDF embedding tools
โ โโโ network/ # Network and search tools
โ โโโ processes_tools/ # Process management tools
โ โโโ system/ # System command tools
โโโ preprocessing/
โ โโโ pdf.py # PDF text extraction and chunking
โโโ DiffSinger/ # DiffSinger vocoder (cloned during install)
โโโ content/ # Drop PDFs here for auto-embedding
โโโ data/ # ChromaDB storage
โโโ voices/ # Voice model files
โโโ prompt.yaml # System prompt configuration
โโโ requirements.txt # Python dependencies
โโโ requirements.txt # Python dependencies
โโโ install.sh # Smart installer script
๐ฆ Installation
Prerequisites
- Operating System: Ubuntu/Linux (tested on Ubuntu 20.04+)
- Python: 3.8 or higher
- GPU: CUDA-compatible GPU recommended (for TTS and faster inference)
- Disk Space: ~2GB for dependencies and models
- Internet: Required for downloading dependencies and models
Automated Installation (Recommended)
The installer script handles everything automatically - just run it and follow the prompts!
- Clone the repository:
git clone <your-repo-url> cd StudyWithMiku
- Run the automated installer:
chmod +x install.sh ./install.sh
The installer will automatically:
- โ Check and install system dependencies (python3, git, curl, unzip, etc.)
- โ Create and activate a Python virtual environment
- โ Upgrade pip, setuptools, and wheel
- โ Install PyTorch with CUDA 11.8 support
- โ Install all Python dependencies from requirements.txt
- โ Clone the DiffSinger repository
- โ Install DiffSinger dependencies and fix conflicts
- โ Configure PYTHONPATH in ~/.bashrc
- โ Download NSF-HiFiGAN vocoder model (~93MB)
- โ Create .env file from .env.example
- โ Create content/ and data/ directories
- โ Verify all installations with dependency tests
- โ Optionally launch the application immediately
Installation Progress:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ค StudyWithMiku - Automated Installer ๐ค โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[1/10] Checking system dependencies...
[2/10] Setting up virtual environment...
[3/10] Installing PyTorch...
[4/10] Installing project dependencies...
[5/10] Setting up DiffSinger...
[6/10] Installing DiffSinger dependencies...
[7/10] Configuring PYTHONPATH...
[8/10] Downloading vocoder model...
[9/10] Configuring environment variables...
[10/10] Verifying installation...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Installation Complete! ๐ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Configure your environment:
nano .env # Edit with your LLM provider settings- Start the application:
source venv/bin/activate
python main.pyManual Installation
If you prefer manual control or the automated installer fails:
- Install system dependencies:
sudo apt update sudo apt install -y python3 python3-venv python3-pip git curl unzip
- Create virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel- Install PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Install dependencies:
pip install -r requirements.txt
- Setup DiffSinger:
git clone https://github.com/openvpi/DiffSinger.git cd DiffSinger pip install -r requirements.txt pip install librosa==0.10.0 protobuf==3.19.5 --force-reinstall cd .. export PYTHONPATH=$PYTHONPATH:$(pwd)/DiffSinger echo "export PYTHONPATH=\$PYTHONPATH:$(pwd)/DiffSinger" >> ~/.bashrc
- Download vocoder model:
mkdir -p models/vocoder cd models/vocoder curl -L -o nsf_hifigan_20221211.zip https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip unzip nsf_hifigan_20221211.zip cd ../..
- Configure environment:
cp .env.example .env nano .env
Environment Variables
# LLM Configuration MODEL_NAME="llama3.2:3b" # Model name MODEL_TYPE="ollama" # ollama | google | openai # Embedding Configuration EMBEDDING_MODEL_NAME="nomic-embed-text" EMBEDDING_MODEL_TYPE="ollama" # ollama | google # TTS Configuration MODEL_TTS_NAME="tts_models/en/vctk/vits" MODEL_TTS_TYPE="tts" # tts | vocoder # API URLs OLLAMA_BASE_URL="http://localhost:11434" # API Keys (if using cloud providers) GOOGLE_API_KEY="" OPENAI_API_KEY="" # Model Settings MAX_OUTPUT_TOKEN=512 EMBEDDEDING_TRESHOLD=0.0 # Database DB_LOCATION="./data" CHROMA_COLLECTION_NAME="study_docs"
๐ Usage
Starting the Assistant
- Activate the virtual environment:
- Run the assistant:
- Interact with Miku:
๐งโ๐ป You: Hello Miku!
๐ค AI: Hi there! ^_^ Miku is here to help you study! โ
Adding Study Materials
Simply drop PDF files into the content/ folder while the assistant is running:
cp my-textbook.pdf content/
The assistant will:
- Detect the new file automatically
- Launch a background process to extract and embed the content
- Notify you when embedding is complete
- Use the content to answer your questions
Example Interactions
Asking about embedded content:
๐งโ๐ป You: What is quantum mechanics?
๐ค AI: [Retrieves relevant sections from your physics textbook]
Web search:
๐งโ๐ป You: Search for latest AI research papers
๐ค AI: [Performs DuckDuckGo search and presents results]
Opening URLs:
๐งโ๐ป You: Open https://github.com
๐ค AI: [Checks internet, opens browser]
System commands:
๐งโ๐ป You: What's the current date?
๐ค AI: [Runs 'date' command and shows result]
๐งช Testing
Test PDF Embedding
python pdf_worker_runner.py path/to/test.pdf
Test LLM Connection
python -c "from models.LLM import LLM; llm = LLM().initialize(); print(llm.invoke('Hello'))"Test Embedding Model
python -c "from models.embedding import EmbeddingConfig; emb = EmbeddingConfig(); print(len(emb.get_embedding_model().embed_query('test')))"๐จ Customization
Adding Custom Tools
- Create a new tool in
tools/<category>/your_tool.py:
from langchain_core.tools import tool @tool def your_custom_tool(param: str) -> str: """Tool description for the LLM.""" # Your implementation return "result"
- Register it in
core/tools.py:
from tools.category.your_tool import your_custom_tool __all__ = [..., your_custom_tool]
- Update
prompt.yamlto document the new tool
Changing Miku's Personality
Edit prompt.yaml to customize:
- Personality traits
- Response style
- Tool usage instructions
- Safety rules
๐ง Troubleshooting
DiffSinger Import Errors
export PYTHONPATH=$PYTHONPATH:$(pwd)/DiffSinger source ~/.bashrc
ChromaDB Persistence Issues
Delete and recreate the database:
rm -rf data/
python main.py # Will recreate automaticallyCUDA/GPU Issues
Install CPU-only PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Ollama Connection Failed
Start Ollama server:
๐ Dependencies
Core dependencies:
langchain- LLM frameworklanggraph- Agent orchestrationchromadb- Vector databasecoqui-tts- Text-to-speechollama- Local LLM runtimewatchdog- File system monitoringddgs- DuckDuckGo search
See requirements.txt for complete list.
๐ค Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
๐ License
[Your License Here]
๐ Acknowledgments
- DiffSinger: OpenVPI's neural vocoder for singing voice synthesis
- Coqui TTS: Open-source text-to-speech engine
- LangChain: Framework for LLM applications
- ChromaDB: Embedding database
๐ง Contact
mailto:zekogml11@gmail.com
Made with โค๏ธ by the zkzk ๐คโจ