I had already written about this 5 months back. But want to touch upon this again as this is really important. and I dont see any change in the direction of the government.
We need a strategic reset.
The IndiaAI mission was setup with a goal to invest 10,000 crore in AI. But unfortunately most of the investment went into GPUs. In fact when the call for proposals came, the proposals with heavy GPU investment were chosen over proposals which were more strategic.
But here is the problem. Hardware is a rapidly depreciating asset; the H100s of today will be the electronic waste of tomorrow. In contrast, high-quality, linguistically diverse, and culturally grounded datasets are appreciating assets that form the bedrock of sovereign intelligence. A model trained on rented GPUs using borrowed Western data is just a localized inference engine. True sovereignty requires the ownership of the “thought process”, the reasoning chains, the cultural contexts, and the linguistic nuances that define India.
In fact by the time the proposals were sanctioned and GPUs were acquired we already had the next generation of GPUs in the market. And all the proposals faced one big problem at the time of implementation.
They had no data to train.
We already have a blue print on how to save our languages and culture. This same blue print can be used to create datasets for AI which will help us build localized LLMs.
Following the Meiji Restoration in 1868, Japan faced an existential crisis. To avoid colonization, it needed to modernize rapidly. The government realized that modernization meant buying Western guns and steam engines. But it also meant they had to import the software of Western civilization. The books on physics, chemistry, law, and philosophy.
The Meiji government did not rely on the haphazard efforts of private individuals. It established the Bansho Shirabesho (Institute for the Study of Barbarian Books), which later evolved into the University of Tokyo. The state hired thousands of Oyatoi gaikokujin (foreign experts) with the explicit mandate to teach Japanese students who would then translate that knowledge into Japanese.
This was a massive “Data Ingestion” operation. The Ministry of Education systematically translated textbooks on every conceivable subject. This ensured that a Japanese student did not need to learn German to understand chemistry; they could learn it in Japanese. This democratization of high-level knowledge was the engine of Japan’s industrial revolution.
Crucially, the Japanese realized their language lacked words for modern concepts. Rather than borrowing the English words directly, they engineered new words. Scholars like Fukuzawa Yukichi and the translation bureaus coined Wasei kango, Japanese words created using Chinese roots to express Western ideas.
Examples: The concept of “competition” was alien to feudal Japan. Translators coined the word kyoso (race + fight). “Society” became shakai. “Science” became kagaku.
This historical precedent is critical. India missed the bus on the industrial revolution. We just bought the machines. But the “software” was still in English. This is the reason learning English is the gateway to success in India. This is the reason building machines is still foreign to most of us. A person who speaks English is considered knowledgeable, whether they are good at machines or not.
As we move into the AI age, Indian languages lack the technical vocabulary for concepts like “reinforcement learning,” “back propagation,” or “prompt engineering.” If we do not create these words in Tamil, Hindi, or Telugu (or standardize existing ones), our technical discourse will remain permanently “Hinglish,” limiting deep conceptual understanding to the English-speaking elite. IndiaAI must fund a “National Terminology Commission” to do for AI what Meiji Japan did for steam power. Additionally, we need to translate all the older books also as soon as possible and create a corpus. If old Japan could do it without any help, we can certainly do it much faster and cheaper now with all the tools at our disposal.
In the 21st century, Japan pivoted from importing knowledge to exporting culture. The dominance of Anime and Manga is often attributed to creativity, but it is sustained by government subsidy.
The “Cool Japan” strategy is a formal government policy coordinated by the Cabinet Office. The Cool Japan Fund (CJ Fund) is a public-private investment vehicle that actively removes the bottlenecks to cultural export.
Sentai Holdings Investment: In 2019, the CJ Fund invested $30 million in Sentai Holdings, a US-based licensor of anime. The specific goal was to enhance “translation, localization, subtitling, and dubbing.” The Japanese government effectively paid to ensure that American audiences could consume Japanese content seamlessly.
Anime Consortium Japan: The fund invested 1 billion yen to create a legal streaming platform. This targeted piracy but also ensured that high-quality, official translations were available, preserving the cultural nuance that fan-subs often missed.
The Agency for Cultural Affairs (Bunka-cho) runs the JLPP, which selects Japanese literary works and fully funds their translation and publication in foreign languages.
Mechanism: They do not wait for a French publisher to “discover” a Japanese author. They pay for the translation upfront, effectively de-risking the asset for foreign publishers.
Result: This ensures a steady stream of Japanese thought enters the global canon, influencing global culture.
Lesson for India: The “IndiaAI Mission” must adopt this “Subsidy for Friction” model. The government should not just fund the creation of movies or books (which the private sector does well) but fund the translation and localization of this content via AI, making it costless for a global audience to consume Telugu or Kannada culture. The government should invest in exporting our culture, our Chandamama stories, Kasi Majili kathalu etc.
In the next part I will discuss about how we should approach building these datasets.