GitHub - zkzkGamal/StudyWithFriend: StudyWithMiku is an AI-powered animated study assistant that reads and embeds PDFs, understands their content, and explains concepts to users using Miku or Teto voices with interactive animations. The system acts as a virtual tutor that learns from your documents and answers questions in a friendly, visual, and engaging way.

7 min read Original article โ†—

StudyWithMiku ๐ŸŽค๐Ÿ“š

StudyWithMiku is an AI-powered animated study assistant that reads and embeds PDFs, understands their content, and explains concepts to users using Miku or Teto voices with interactive animations. The system acts as a virtual tutor that learns from your documents and answers questions in a friendly, visual, and engaging way.

โœจ Features

๐Ÿค– AI-Powered Assistant

  • Multi-Provider LLM Support: Works with Ollama (local), Google Gemini, and OpenAI models
  • LangGraph Agent Architecture: Intelligent tool-calling and conversation flow management
  • Context-Aware Conversations: Maintains conversation history and understands references

๐Ÿ“„ Document Processing

  • PDF Embedding: Automatically processes PDFs dropped in the content/ folder
  • Vector Database: Uses ChromaDB for semantic search and retrieval
  • RAG Pipeline: Retrieves relevant context from embedded documents to answer questions
  • Background Processing: PDF embedding runs in separate terminal windows without blocking

๐ŸŽต Text-to-Speech (TTS)

  • Multiple TTS Engines:
    • Coqui TTS with multi-speaker support
    • DiffSinger vocoder integration for anime-style voices
  • Voice Options: Miku and Teto character voices
  • Real-time Audio Playback: Speaks responses using sounddevice

๐Ÿ› ๏ธ System Tools

  • Browser Control: Open URLs in default browser
  • Network Management: Check internet connectivity, enable Wi-Fi, web search via DuckDuckGo
  • Process Management: Find and terminate background processes
  • System Commands: Execute shell commands (date, ls, pwd, etc.)
  • File Watching: Monitors content/ folder for new files

๐Ÿ” Intelligent Behavior

  • Automatic Context Retention: Remembers previous conversation context
  • Smart Error Recovery: Handles network failures, missing files, and process errors
  • Path Expansion: Automatically expands ~ to user home directory
  • Web Search Integration: Search the web without manual internet checks

๐Ÿ—๏ธ Architecture

StudyWithMiku/
โ”œโ”€โ”€ main.py                    # Main entry point with event loop
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ agent.py              # LangGraph agent with tool binding
โ”‚   โ”œโ”€โ”€ state.py              # Agent state definition
โ”‚   โ””โ”€โ”€ tools.py              # Tool registry
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ LLM.py                # Multi-provider LLM wrapper
โ”‚   โ”œโ”€โ”€ embedding.py          # Embedding model configuration
โ”‚   โ”œโ”€โ”€ tts.py                # Text-to-speech engine
โ”‚   โ””โ”€โ”€ voice.py              # Voice configuration
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ database.py           # ChromaDB vector store manager
โ”œโ”€โ”€ tools/
โ”‚   โ”œโ”€โ”€ browser/              # Browser control tools
โ”‚   โ”œโ”€โ”€ embedded/             # PDF embedding tools
โ”‚   โ”œโ”€โ”€ network/              # Network and search tools
โ”‚   โ”œโ”€โ”€ processes_tools/      # Process management tools
โ”‚   โ””โ”€โ”€ system/               # System command tools
โ”œโ”€โ”€ preprocessing/
โ”‚   โ””โ”€โ”€ pdf.py                # PDF text extraction and chunking
โ”œโ”€โ”€ DiffSinger/               # DiffSinger vocoder (cloned during install)
โ”œโ”€โ”€ content/                  # Drop PDFs here for auto-embedding
โ”œโ”€โ”€ data/                     # ChromaDB storage
โ”œโ”€โ”€ voices/                   # Voice model files
โ”œโ”€โ”€ prompt.yaml               # System prompt configuration
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ””โ”€โ”€ install.sh                 # Smart installer script

๐Ÿ“ฆ Installation

Prerequisites

  • Operating System: Ubuntu/Linux (tested on Ubuntu 20.04+)
  • Python: 3.8 or higher
  • GPU: CUDA-compatible GPU recommended (for TTS and faster inference)
  • Disk Space: ~2GB for dependencies and models
  • Internet: Required for downloading dependencies and models

Automated Installation (Recommended)

The installer script handles everything automatically - just run it and follow the prompts!

  1. Clone the repository:
git clone <your-repo-url>
cd StudyWithMiku
  1. Run the automated installer:
chmod +x install.sh
./install.sh

The installer will automatically:

  • โœ… Check and install system dependencies (python3, git, curl, unzip, etc.)
  • โœ… Create and activate a Python virtual environment
  • โœ… Upgrade pip, setuptools, and wheel
  • โœ… Install PyTorch with CUDA 11.8 support
  • โœ… Install all Python dependencies from requirements.txt
  • โœ… Clone the DiffSinger repository
  • โœ… Install DiffSinger dependencies and fix conflicts
  • โœ… Configure PYTHONPATH in ~/.bashrc
  • โœ… Download NSF-HiFiGAN vocoder model (~93MB)
  • โœ… Create .env file from .env.example
  • โœ… Create content/ and data/ directories
  • โœ… Verify all installations with dependency tests
  • โœ… Optionally launch the application immediately

Installation Progress:

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘   ๐ŸŽค StudyWithMiku - Automated Installer ๐ŸŽค   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

[1/10] Checking system dependencies...
[2/10] Setting up virtual environment...
[3/10] Installing PyTorch...
[4/10] Installing project dependencies...
[5/10] Setting up DiffSinger...
[6/10] Installing DiffSinger dependencies...
[7/10] Configuring PYTHONPATH...
[8/10] Downloading vocoder model...
[9/10] Configuring environment variables...
[10/10] Verifying installation...

โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘          ๐ŸŽ‰ Installation Complete! ๐ŸŽ‰         โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  1. Configure your environment:
nano .env  # Edit with your LLM provider settings
  1. Start the application:
source venv/bin/activate
python main.py

Manual Installation

If you prefer manual control or the automated installer fails:

  1. Install system dependencies:
sudo apt update
sudo apt install -y python3 python3-venv python3-pip git curl unzip
  1. Create virtual environment:
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
  1. Install PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  1. Install dependencies:
pip install -r requirements.txt
  1. Setup DiffSinger:
git clone https://github.com/openvpi/DiffSinger.git
cd DiffSinger
pip install -r requirements.txt
pip install librosa==0.10.0 protobuf==3.19.5 --force-reinstall
cd ..
export PYTHONPATH=$PYTHONPATH:$(pwd)/DiffSinger
echo "export PYTHONPATH=\$PYTHONPATH:$(pwd)/DiffSinger" >> ~/.bashrc
  1. Download vocoder model:
mkdir -p models/vocoder
cd models/vocoder
curl -L -o nsf_hifigan_20221211.zip https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
unzip nsf_hifigan_20221211.zip
cd ../..
  1. Configure environment:
cp .env.example .env
nano .env

Environment Variables

# LLM Configuration
MODEL_NAME="llama3.2:3b"           # Model name
MODEL_TYPE="ollama"                 # ollama | google | openai

# Embedding Configuration
EMBEDDING_MODEL_NAME="nomic-embed-text"
EMBEDDING_MODEL_TYPE="ollama"       # ollama | google

# TTS Configuration
MODEL_TTS_NAME="tts_models/en/vctk/vits"
MODEL_TTS_TYPE="tts"                # tts | vocoder

# API URLs
OLLAMA_BASE_URL="http://localhost:11434"

# API Keys (if using cloud providers)
GOOGLE_API_KEY=""
OPENAI_API_KEY=""

# Model Settings
MAX_OUTPUT_TOKEN=512
EMBEDDEDING_TRESHOLD=0.0

# Database
DB_LOCATION="./data"
CHROMA_COLLECTION_NAME="study_docs"

๐Ÿš€ Usage

Starting the Assistant

  1. Activate the virtual environment:
  1. Run the assistant:
  1. Interact with Miku:
๐Ÿง‘โ€๐Ÿ’ป You: Hello Miku!
๐Ÿค– AI: Hi there! ^_^ Miku is here to help you study! โ˜…

Adding Study Materials

Simply drop PDF files into the content/ folder while the assistant is running:

cp my-textbook.pdf content/

The assistant will:

  • Detect the new file automatically
  • Launch a background process to extract and embed the content
  • Notify you when embedding is complete
  • Use the content to answer your questions

Example Interactions

Asking about embedded content:

๐Ÿง‘โ€๐Ÿ’ป You: What is quantum mechanics?
๐Ÿค– AI: [Retrieves relevant sections from your physics textbook]

Web search:

๐Ÿง‘โ€๐Ÿ’ป You: Search for latest AI research papers
๐Ÿค– AI: [Performs DuckDuckGo search and presents results]

Opening URLs:

๐Ÿง‘โ€๐Ÿ’ป You: Open https://github.com
๐Ÿค– AI: [Checks internet, opens browser]

System commands:

๐Ÿง‘โ€๐Ÿ’ป You: What's the current date?
๐Ÿค– AI: [Runs 'date' command and shows result]

๐Ÿงช Testing

Test PDF Embedding

python pdf_worker_runner.py path/to/test.pdf

Test LLM Connection

python -c "from models.LLM import LLM; llm = LLM().initialize(); print(llm.invoke('Hello'))"

Test Embedding Model

python -c "from models.embedding import EmbeddingConfig; emb = EmbeddingConfig(); print(len(emb.get_embedding_model().embed_query('test')))"

๐ŸŽจ Customization

Adding Custom Tools

  1. Create a new tool in tools/<category>/your_tool.py:
from langchain_core.tools import tool

@tool
def your_custom_tool(param: str) -> str:
    """Tool description for the LLM."""
    # Your implementation
    return "result"
  1. Register it in core/tools.py:
from tools.category.your_tool import your_custom_tool
__all__ = [..., your_custom_tool]
  1. Update prompt.yaml to document the new tool

Changing Miku's Personality

Edit prompt.yaml to customize:

  • Personality traits
  • Response style
  • Tool usage instructions
  • Safety rules

๐Ÿ”ง Troubleshooting

DiffSinger Import Errors

export PYTHONPATH=$PYTHONPATH:$(pwd)/DiffSinger
source ~/.bashrc

ChromaDB Persistence Issues

Delete and recreate the database:

rm -rf data/
python main.py  # Will recreate automatically

CUDA/GPU Issues

Install CPU-only PyTorch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Ollama Connection Failed

Start Ollama server:

๐Ÿ“ Dependencies

Core dependencies:

  • langchain - LLM framework
  • langgraph - Agent orchestration
  • chromadb - Vector database
  • coqui-tts - Text-to-speech
  • ollama - Local LLM runtime
  • watchdog - File system monitoring
  • ddgs - DuckDuckGo search

See requirements.txt for complete list.

๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

๐Ÿ“„ License

[Your License Here]

๐Ÿ™ Acknowledgments

  • DiffSinger: OpenVPI's neural vocoder for singing voice synthesis
  • Coqui TTS: Open-source text-to-speech engine
  • LangChain: Framework for LLM applications
  • ChromaDB: Embedding database

๐Ÿ“ง Contact

mailto:zekogml11@gmail.com

Made with โค๏ธ by the zkzk ๐ŸŽคโœจ