GitHub - lonestone/micdrop: Micdrop is a set packages for node and browser that simplify voice conversations with AI systems.

🖐️🎤 Micdrop: Real-Time Voice Conversations with AI

Micdrop is a set of open source Typescript packages to build real-time voice conversations with AI agents. It handles all the complexities on the browser and server side (microphone, speaker, VAD, network communication, etc) and provides ready-to-use implementations for various AI providers.

📦 Packages

Core Packages (start here)

@micdrop/client - Browser library handling microphone input, audio playback, and real-time communication
@micdrop/server - Server implementation for audio streaming and AI integration orchestration

AI Implementations

@micdrop/openai - OpenAI integration providing LLM agent and speech-to-text capabilities
@micdrop/ai-sdk - AI SDK agent compatible with a lot of LLM providers.
@micdrop/elevenlabs - ElevenLabs text-to-speech integration with streaming support
@micdrop/cartesia - Cartesia text-to-speech integration for real-time voice synthesis
@micdrop/mistral - Mistral AI agent integration for conversation handling
@micdrop/gladia - Gladia speech-to-text integration for audio transcription

Utility Packages

@micdrop/react - React hooks for Micdrop

Demo Applications

demo-client - Example web application with React.
demo-server - Example server with fastify.

🎥 Demo and technical details (video)

See the author Godefroy de Compreignac talking about Micdrop and voice AI in this video:

🤔 Why Micdrop?

While real-time multimodal models (voice-to-voice) offer impressive capabilities, they often come with limitations in terms of customization and cost. Micdrop takes a different approach by:

🎯 Allowing you to choose the best-in-class API for each component:
- Select specific voices from TTS providers
- Use different LLMs optimized for your use case
- Pick STT engines suited for specific languages/accents
💰 Reducing costs by letting you:
- Use more cost-effective API providers
- Mix open source and commercial solutions
- Control exactly when APIs are called
🔧 Providing granular control over the conversation flow
🌐 Supporting a wider range of languages and voices through specialized providers

This modular approach gives you the flexibility to build voice applications that are both powerful and cost-effective.

🌟 Features

🎙️ Microphone handling with:
- Streaming support
- Voice Activity Detection (VAD)
🔊 Advanced audio playback with:
- Streaming support
- Device selection and control
🌐 WebSocket communication
📦 AI implementations provided for OpenAI, ElevenLabs, Mistral, Gladia, and more
🔌 Bring your own AI components (framework agnostic)
- Large Language Models (LLM)
- Text-to-Speech (TTS)
- Speech-to-Text (STT)

🧪 Development

For detailed development instructions, including how to build, test, and publish packages, please see DEVELOPMENT.md.

📄 License

MIT License - see the LICENSE file for details

Author

Originally developed for Raconte.ai and open sourced by Lonestone (GitHub)