GitHub - lonestone/micdrop: Micdrop is a set packages for node and browser that simplify voice conversations with AI systems.

2 min read Original article โ†—

๐Ÿ–๏ธ๐ŸŽค Micdrop: Real-Time Voice Conversations with AI

Micdrop website | Documentation

Micdrop is a set of open source Typescript packages to build real-time voice conversations with AI agents. It handles all the complexities on the browser and server side (microphone, speaker, VAD, network communication, etc) and provides ready-to-use implementations for various AI providers.

๐Ÿ“ฆ Packages

Core Packages (start here)

  • @micdrop/client - Browser library handling microphone input, audio playback, and real-time communication
  • @micdrop/server - Server implementation for audio streaming and AI integration orchestration

AI Implementations

Utility Packages

Demo Applications

๐ŸŽฅ Demo and technical details (video)

See the author Godefroy de Compreignac talking about Micdrop and voice AI in this video:

Youtube video

๐Ÿค” Why Micdrop?

While real-time multimodal models (voice-to-voice) offer impressive capabilities, they often come with limitations in terms of customization and cost. Micdrop takes a different approach by:

  • ๐ŸŽฏ Allowing you to choose the best-in-class API for each component:
    • Select specific voices from TTS providers
    • Use different LLMs optimized for your use case
    • Pick STT engines suited for specific languages/accents
  • ๐Ÿ’ฐ Reducing costs by letting you:
    • Use more cost-effective API providers
    • Mix open source and commercial solutions
    • Control exactly when APIs are called
  • ๐Ÿ”ง Providing granular control over the conversation flow
  • ๐ŸŒ Supporting a wider range of languages and voices through specialized providers

This modular approach gives you the flexibility to build voice applications that are both powerful and cost-effective.

๐ŸŒŸ Features

  • ๐ŸŽ™๏ธ Microphone handling with:
    • Streaming support
    • Voice Activity Detection (VAD)
  • ๐Ÿ”Š Advanced audio playback with:
    • Streaming support
    • Device selection and control
  • ๐ŸŒ WebSocket communication
  • ๐Ÿ“ฆ AI implementations provided for OpenAI, ElevenLabs, Mistral, Gladia, and more
  • ๐Ÿ”Œ Bring your own AI components (framework agnostic)
    • Large Language Models (LLM)
    • Text-to-Speech (TTS)
    • Speech-to-Text (STT)

๐Ÿงช Development

For detailed development instructions, including how to build, test, and publish packages, please see DEVELOPMENT.md.

๐Ÿ“„ License

MIT License - see the LICENSE file for details

Author

Originally developed for Raconte.ai and open sourced by Lonestone (GitHub)