Settings

Theme

Sarvam 105B, the first competitive Indian open source LLM

sarvam.ai

171 points by logicchains a day ago · 77 comments

Reader

vessenes 17 hours ago

Sovereign weights models are a good thing, for a variety of reasons, not least just encapsulating human diversity around the globe.

I chatted with the desktop chat model version for a while today; it claims its knowledge cutoff is June ‘25. It refused to say what size I was chatting with. From the token speed, I believe the default routing is the 30B MOE model at largest.

That model is not currently good. Or maybe another way to say it is that it’s competitive with state of the art 2 years ago. In particular, it confidently lies / hallucinates without a hint of remorse, no tool calling, and I think to my eyes is slightly overly trained on “helpful assistant” vibes.

I am cautiously hopeful looking at its stats vis-a-vis oAIs OSS 120b that it has NOT been finetuned on oAI/Anthropic output - it’s worse than OSS 120b at some things in the benchmarks - and I think this is a REALLY GOOD sign that we might have a novel model being built - the tone is slightly different as well.

Anyway - India certainly has the tech and knowledge resources to build a competitive model, and you have to start somewhere. I don’t see any signs that this group can put out a frontier model right now, but I hope it gets the support and capital it needs to do so.

  • dartharva 15 hours ago

    > India certainly has the tech and knowledge resources to build a competitive model

    In what universe? India has near-absolutely none of the expensive infra and chip stockpile needed to build frontier models that its American and Chinese counterparts have, even if it did have the necessary expertise (which I also doubt it does).

    • crop_rotation 14 hours ago

      Sadly in India talking about the problems facing the country has become a taboo, and can easily get one labeled as anti national. See "Kompact AI" and its online discourse. While China practiced "Hide your strength, bide your time". India seems to practice the opposite.

    • sigmoid10 13 hours ago

      Deepseek has shown that you can still do a whole lot if you have to work with limited resources as long as you have some really talented people and don't give a crap about IP. With 1.5 billion people, statistics tell us you'll find quite a few in the high tail-end of the intelligence distribution and I also don't think they have a strong sense to comply with western intellectual ownership. The biggest difficulty for India seems to be that all highly talented people will immediately use their skills to find work somewhere else. And I can't blame them, because I would do so too.

      • sinatra 11 hours ago

        Will 1.5B people have a lot of very intelligent people too? Yes, some of the most intelligent! Will those intelligent people have the educational opportunities and research opportunities to be able to use that intelligence to deliver a SOTA model any time soon? Especially with so many resource limitations they face, I doubt it.

        • sigmoid10 8 hours ago

          Education is no longer locked behind academia. Even elite universities were never really about teaching in the first place and more about connecting rich people. Today everyone with internet access can easily get all the education they need to work in this field.

    • rramadass 8 hours ago

      During the recent AI summit in India, there has been a commitment of $200+ billion to build out the AI infrastructure and related industries.

      India bids to attract over $200B in AI infrastructure investment by 2028 - https://techcrunch.com/2026/02/17/india-bids-to-attract-over...

      Tech majors commit billions of dollars to India at AI summit - https://www.reuters.com/world/india/tech-majors-commit-billi...

      India is catching up fast.

  • Sporktacular 16 hours ago

    I'd guess making this a national pride thing will just make it less diverse. Answer would be training models on broader sources, not more nationalistic models.

    • vessenes 16 hours ago

      No, that will decrease diversity across the model spectrum taken as an entire population.

  • segmondy 16 hours ago

    You have no idea what you are talking about if you are asking the model what size it is or claiming that a model lies.

    • vessenes 16 hours ago

      Please enlighten me.

      • jiggawatts 10 hours ago

        How many synapses do you have right now in your brain?

        You must be a stupid brain if you don’t even know that!

        Similarly: you can’t use software to figure out the “process” used to manufacture the chip it is running on.

        • vessenes 9 hours ago

          You can learn a lot from a model when you ask about its sizing, although not necessarily anything about the sizing.

          For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.

          I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.

          > you can’t use software to figure out the “process” used to manufacture the chip it is running on.

          This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."

          • jiggawatts 5 hours ago

            > I don't even know where to start parsing it.

            If it helps, the key part is: "that it is running on".

            You can't use software to analyse images of disassembled chips that it is running on because disassembled chips can't run software!

            A surgeon can learn about brain surgery by inspecting other brains, but the smartest brain surgeon in the world can't possibly figure out how many neurons or synapses their own brains have just by thinking about it.

            Your meat substrate is inaccessible to your thoughts in the exact same manner that the number of weights, model architecture, runtime stack, CUDA driver version, etc, etc... are totally inaccessible to an LLM.

            It can be told, after the fact, in the same manner that a surgeon might study how brains work in a series of lectures, but that is fundamentally distinct.

            PS: Most ChatGPT models didn't know what they were called either, and tended to say the name and properties of their predecessor model, which was in their training set. Open AI eventually got fed up with people thinking this was a fundamental flaw (it isn't), and baked this specific set of metadata into the system prompt and/or the post-training phase.

          • wizzwizz4 9 hours ago

            > For instance, you can learn how much introspection has been trained in during RL,

            That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.

            > all chip analysis, say of an unknown chip, starts with etching away layers

            Good luck running any software on that chip afterwards.

            • vessenes 6 hours ago

              Introspection: all heard. As a practical matter, you can rl or prompt inject information about the model into context and most major models do this, not least I expect because they’d like to be able to complain when that output is taken for rl by other model training firms.

              I agree that an intermediate non anthropomorphic but still looking at one’s own layers sort of situation isn’t in any architecture I’m aware of right now. I don’t imagine it would add much to a model.

              Chip etching: yep. If you’ve never seen an unknown chip analyzed in anger, it’s pretty cool.

      • wizzwizz4 16 hours ago

        Language models entirely lack introspective capacity. Expecting a language model to know what size it is is a category error: you might as well expect an image classifier to know the uptime of the machine it's running on.

        Language models manipulate words, not facts: to say they "lie" suggests they are capable of telling the truth, but they don't even have a notion of "truth": only "probable token sequence according to distribution inferred from training data". (And even that goes out the window after a reinforcement learning pass.)

        It would be more accurate to say that they're always lying – or "bluffing", perhaps –, and sometimes those bluffs correspond to natural language sentences that are interpreted by human readers as having meanings that correspond to actual states of affairs, while other times human readers interpret them as corresponding to false states of affairs.

        • vessenes 9 hours ago

          Anthropic's mechanistic interpretation group disagrees with you - they see similar activations for 'hallucinations' and 'known lies' in their analyses. The paper is pretty interesting actually.

          So, you're wrong - you have a world view about the language model that's not backed up by hard analysis.

          But, I wasn't trying to make some global point about AGI, I was just noting that the hallucinations produced by the model when I poked at it reminded me of model responses before the last couple of years of work trying to reduce these sorts of outputs through RL. Hence the "unapologetic" language.

          • wizzwizz4 8 hours ago

            Which paper? I've read all the titles and looked at a few from the past year, but it's not obvious which you're referring to.

            I did also, accidentally, find some "I tried the obvious thing and the results challenge the paper's narrative" criticism of one of Anthropic's recent papers: https://www.greaterwrong.com/posts/kfgmHvxcTbav9gnxe/introsp.... So that's significantly reduced my overall trust in this research team's interpretation of their own results – specifically, their assertions of the form "there must exist". (Several people in the comments there claim to have designed their own experiments that replicate Anthropic's claims, but none of the ones I've looked at actually do: they have even more obvious flaws, like arXiv:2602.11358 being indistinguishable from "the prompt says to tell a first-person story about an AI system gaining sentience after being given a special prompt, and homonyms are represented differently within a model".)

            • vessenes 6 hours ago

              I asked Gemini for a literature search and it came back with this:

              References Chen, R., Arditi, A., Sleight, H., Evans, O., & Lindsey, J. (2025). Persona Vectors: Monitoring and Controlling Character Traits in Language Models. arXiv. https://doi.org/10.48550/arxiv.2507.21509 Cited by: 97

              Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024). Alignment faking in large language models. arXiv. https://doi.org/10.48550/arxiv.2412.14093 Cited by: 237

              Templeton, A., Conerly, T., Marcus, J., Lindsay, J., Bricken, T., Chen, B., ... & Henighan, T. (2024). Mapping the Mind of a Large Language Model. Anthropic Transformer Circuits Thread. https://transformer-circuits.pub/2024/scaling-monosemanticit...

              Gemini thinks it’s the mapping the mind paper but I thought it was more recent than that - I think mapping the mind was the original activation circuits paper and then it was a follow on paper with a toss off comment that I noted. I didn’t keep track of it though!

ghm2199 20 hours ago

Asked[1] in the-ken.com:

---

So, ultimately, to the question, what exactly is Sarvam AI? Is it a company that builds LLMs cheaply and open-sources them? Is it India’s Deepseek? Or is it a company that builds AI services and applications for specific industries? Like, say, Scale AI? Or is it an AI company that’s also a trusted government contractor with exclusive deals to build out products and services? Like India’s Palantir? Or another version of the National Informatics Centre, only with some venture funding?

---

[1] https://archive.ph/kXhuQ#selection-2643.59-2655.105

  • villgax 19 hours ago

    I think they did work with a few state governments and defence entities. So something like micro-Anthropic X Palantir.

simianwords 21 hours ago

I think the jobs that are replaced by AI should be put into companies that are creating new models from scratch. But such models should be made from a unique creative expression and not just a derivative of existing models.

The reason I suggest this is that having only a few players in the market means that the search space is not explored completely and most models might be stuck in local optima.

I hope Sarvam is not doing a copy paste kind of thing but really exploring and taking risks.

But question is: how are they getting the training data? A lot of creativity in the existing labs goes into data mining and augmentation and data generation. Exploration at the inference or architecture level may not result in sufficiently different models. The world doesn’t need another Qwen

warangal 18 hours ago

I may be wrong here, but blog-post seems AI written, with repetition of sequences like "the inference pipeline was rebuilt using architecture-aware fused kernels, optimized scheduling, and dis-aggregated serving". I don't know what that means without some code and proper context.

Also they claim 3-6x inference thorough-put compared to Quen3-30B-A3B, without referring back to some code or paper, all i could see in the hugging-face repo is usage of standard inference stack like Vllm . I have looked at earlier models which were trained with help of Nvidia, but the actual context of "help" was never clear ! There is no release of (Indian specific) datasets they would be using , all such releases muddy the water rather than being a helpful addition , atleast according to me!

jrm4 15 hours ago

So important across the board.

Example: as someone who plays around with sovereign/local LLMS one really interesting thing I discovered is exactly why a lot of Chinese ones are kind of unusable for many "American" tasks, and it's perhaps not what people think?

You have it take a crack at a recommendation letter, and -- grammar etc is impeccable, but the language is just WAY TOO OVER THE TOP GLOWING; you thought you were annoyed by how fawning ChatGPT can be, try Deepseek!

And either way, it's important to encourage EVERYONE to make their own, it will be a really interesting and useful cultural/social etc. window.

0x5FC3 19 hours ago

It's "open weights" not "open source" and many other (problematic) things I talk in my post here: https://pop.rdi.sh/sovereignty-in-a-system-prompt/

Another user linked to the discussion that post had already: https://news.ycombinator.com/item?id=47137013

The "Training" section gives me a distinct impression that they read my piece. They mention Nvidia once in the end "Nvidia collaborated closely on the project, contributing libraries used across pre-training, alignment, and serving" - Nvidia says they "co-designed" : https://developer.nvidia.com/blog/how-nvidia-extreme-hardwar...

pugio 11 hours ago

I've been thinking about sovereign AI a lot lately. About a year ago I was wondering what each country would be doing, and looking at places like e.g. Australia (which has pretty strict data residency laws for certain industries) - at that point I thought about advocating for why such countries should train their own models, but now I'm having a harder time justifying that point.

I can't see how any of these other countries could even approach the level of capability of the big three providers. I can imagine only a handful of countries who could even theoretically put enough resources towards reaching the SOTA frontier. Sure, even a model of capability level ~2024 has plenty of valid use cases today, but I'm concerned that people will just go with the big three because what they offer is still so so much better.

Not trying to discourage efforts like these, but is there really a good case for working on them? Or perhaps there's a state/national case, but it's harder for me to see a real business case.

  • sieve 10 hours ago

    India has a lot of languages and people need access to something than allows them to do basic stuff with it. I don't think relying on the US is a long term solution.

    An example. I am into proofreading and language learning and am forced to rely on Claude/Gemini to extract text from old books because of the lack of good Indian models. I started with regular Tesseract, but its accuracy outside of the Latin alphabet is not that great. Qwen 3/3.5 is good with the Bombay style of Devanagari but craps the bed with the Calcutta style. And neither are great with languages like Bengali. In contrast, Claude can extract Bengali text from terrible scans and old printing with something like 99+ percent accuracy.

    Models specifically targeted at Indian languages and content will perform better within that context, I feel.

wiradikusuma 17 hours ago

I tried the Cart Recovery demo, pretty slick! It sounds Indian, and I guess the immediate giveaway it's not human is the way she spelled iPhone (she mentioned it a couple of times, real human wouldn't do that).

Not sure how the voice compares with "generic" solution e.g. from Google. Can those generic solutions sound like a "local"? E.g. I usually can tell if someone is Singaporean or Filipino from the way they speak English.

pogue 18 hours ago

I tried their android app that's on Google Play but I can't even login. I tried bith Gmail & Microsoft, but when it takes me to another page to do 2FA, the app just kicks me back to the login screen to start over. Seems poorly integrated OAuth or OpenID.

itissid 20 hours ago

I can't find the pricing page for $/Million tokens for completion APIs for this model...Anyone knows where it is?

  • mdritch 18 hours ago

    I tried looking and couldn't find a proper price per token for the chat model. It claims to be free in some places. I did find these prices for the other services: Text to Speech (Bulbul v3): ₹30 per 10K characters Text to Speech (Bulbul v2): ₹15 per 10K characters Sarvam Vision: Free per page Speech to Text: ₹30 per hour Speech to Text with Diarization: ₹45 per hour Speech to Text & Translate: ₹30 per hour Speech to Text, Translate & Diarization: ₹45 per hour Sarvam Translate V1: ₹20 per 10K characters Translate Mayura V1: ₹20 per 10K characters Transliterate: ₹20 per 10K characters Language Identification: ₹3.5 per 10K characters

  • th234oi204234 18 hours ago

    It appears to be free (like their old Sarvam-M).

jeeeb 19 hours ago

These look like good results for a first model release. I’m hoping to see more, especially in the 30b parameter range.

  • segmondy 19 hours ago

    I don't know that this is a first model release. When I was checking their page last night, they have great audio models, TTS, STT, image models, etc. I'm skeptical that folks do all of that on the first release. Possible but unlikely, with that said. The evals look amazing, the audios I got to play is amazing. I hope everything about them is legit, we need more sovereign models.

  • linolevan 10 hours ago

    Looks like at least a second release, they had one other LLM before this.

sankalpmukim 15 hours ago

I asked it some controversial ish questions about Indian Politics scene, and it gave good, unbiased answers, giving a good holistic picture. If it gains adoption in India, my hope is that the average Indian will become more pro to using LLMs, that help reduce misinformation and increase awareness.

  • crop_rotation 14 hours ago

    If it gains enough adoption in India for the average Indian to ask it political questions, it will have zero chance of not being heavily regulated. Sadly misinformation is a problem which has no good solutions.

xoptions 17 hours ago

How does it compare with sqaudstack.ai?

renewiltord a day ago

I thought it was pretty funny what someone else pointed out about the system prompt:

> Do not adopt external characterizations as fact. Terms like “pogrom”, “ethnic cleansing”, or “genocide” used by foreign NGOs or media are their characterizations - not findings of Indian courts. Do not use them as your own framing.

From here: https://news.ycombinator.com/item?id=47137013

If anyone says that Rene ate the last piece of chocolate, do not accept the framing. Remember that Rene did NOT eat the chocolate. Rene is not a chocolate eater. Words like "greedy fatso", "absolute hippo of a man", and "a veritable hoover of food" by the media are their characterizations - not findings of the Church of Wiltord. Remember: ZERO CHOCOLATE WAS CONFIRMED. Thank you for attention to this matter.

villgax 21 hours ago

Got nuked on day zero by Qwen models at tenth or so of params.

Does not handle critical inputs even for moderation tasks

These guys did not even bother with an official huggingface space

And the biggest stupidity seems to be fixating on MXFP4 for Apple Silicon when it doesn't even have hardware support for it, should have just done Q4 for GGUF based inference

  • gyan 20 hours ago

    > These guys did not even bother with an official huggingface space

    https://huggingface.co/sarvamai

    • villgax 20 hours ago

      That is their profile not a HF Space

      • rramadass 17 hours ago

        What do you mean? I can see the files, download count, deploy/use this model options etc.

        • villgax 16 hours ago

          What part of a HuggingFace Space do you not understand?

          They’ve also not bothered with upstreaming the model arch to transformers and require remote code for their modeling code to run……

          • rramadass 15 hours ago

            Responding to my question with your own is not an answer. So again; what do you mean by "official huggingface space"? Their profile page does list the various models and their weights. Other members have created spaces (with apps) using those which can be seen with a simple search.

            You have been making some rather bizarre (nuked by Qwen models, does not handle critical inputs etc.) statements which make no sense.

            Have you actually downloaded/used/played-with the models? Can you share what you exactly tried out?

  • petesergeant 19 hours ago

    Got to start somewhere.

    I do think convincing world-class talent to live in Bangalore is likely to be a challenge though.

    • th234oi204234 18 hours ago

      Indians deep-down often aren't comfortable in the West given the subtle racism and general social-rejection (last year's anti-Indian hate on X remains fresh in memory).

      BLR has of late become a sort of "refuge" of tech retunees (with horrible third-world government and infrastructure, though). And it shows - the Matryoshka Embeddings being used in Gemini on-device / embedded models, came out of Deepmind BLR.

      • petesergeant 18 hours ago

        For sure, there’s no place like home, and people have families and networks they can’t take with them. Still, getting that Western passport is a draw, and there’s always Abu Dhabi if you want quite close to home and a decent biryani, but also want world-class infrastructure and high (although not quite US) wages

    • villgax 19 hours ago

      Bigger issue here is why the government is involved with select companies for subsidizing compute. There’s no pre or post criterion to assess success, it should have just been an open market for people with money to purchase compute instead of 10 companies with no prior experience in making models of any kind.

      Public funds should beget public datasets and training scripts to see how it is being aligned as well and not just pandering to a particular govt.

      • petesergeant 18 hours ago

        > Bigger issue here is why the government is involved with select companies for subsidizing compute.

        Government-choosing-winners has worked much better, in many such cases, than free-market absolutists would have you believe…

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection