Settings

Theme

Could Sarvam 30B/105B Models Be India's Answer to DeepSeek and Mistral?

shivekkhurana.com

5 points by shivekkhurana a month ago · 5 comments

Reader

alephnerd a month ago

This aligns with what I've been thinking and chatting with my peers about - technical documentation would be useful to benchmark performance globally, but I have heard murmurs of it already being used for voice-gen usecases by a WITCH company.

  • shivekkhuranaOP a month ago

    The TTS/STT models are actually good and aggressively priced. I personally built a voice-mode ai assistant.

    STT time to first token is ~300ms. ~20 second audio takes less than 1 second to be converted.

    TTS time to first token is ~700ms. ~20 second of audio is generated under 2 seconds.

    • alephnerd a month ago

      Absolutely! The TTS/STT approach that Sarvam and the other Indian firms are taking is more intuitive for a larger share of people and usecases. The "replace an SDR" or "replace a call-center" usecase is such an easy win to show POV.

      I feel this is also why you don't see the same degree of hype as you would with the other players. When you are taking an application-driven approach to launching AI products, hype matters less than targeting decisionmakers and showing that your product directly aligns with their outcomes.

      • porridgeraisin a month ago

        One other reason STT and OCR (checkout sarvam vision demo on their website, extremely good!) is the focus is to use it to build indian language datasets that can then be used to train larger LLMs than the current 105B one. Most training data in indian languages (you'd know, there are more than just hindi) is in either speech form, or old books.

        If you add in the commercial aspect you pointed out, TTS/STT becomes even more important.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection