← Back to blog

explainx / blog

India's Sovereign AI Status: What It Really Means, What's Been Built, and What's Still Missing (2026)

India has committed $1.25 billion to sovereign AI through the IndiaAI Mission, launched Sarvam 30B and 105B, and positioned itself as the Global South's AI leader. But all of it runs on NVIDIA chips. Here is the full, unvarnished picture of where India actually stands.

·23 min read·Yash Thakker
India AISovereign AIIndiaAI MissionSarvam AIKrutrimAI PolicyGeopoliticsAI Regulation
India's Sovereign AI Status: What It Really Means, What's Been Built, and What's Still Missing (2026)

When India's IT Minister Ashwini Vaishnaw stepped onto the stage at the India AI Impact Summit in February 2026 and announced 20,000 additional GPUs for the national AI compute pool, the headline felt triumphant: India was building sovereign AI. When Sarvam AI open-sourced its 30B and 105B models — trained on Indian compute, optimized for 22 Indian languages — the narrative felt complete.

The reality is more interesting, more complicated, and ultimately more honest than either the triumphalism or the cynicism suggest.

India has made genuine, measurable progress toward AI sovereignty in 2026. It has also built that progress on a foundation of American chips from a company that could, in a different geopolitical moment, cut off supply the way the US restricted Chinese access to advanced semiconductors. Understanding both the achievement and the dependency is the only way to assess where India actually stands.


What "Sovereign AI" Actually Means — and Why India Cares

The term "sovereign AI" does not have a legal definition. There is no international standard, no technical certification, no treaty requirement. It is a political and strategic concept that different countries use to mean different things.

In India's usage, sovereign AI describes a combination of four things:

Compute independence: Owning or controlling the GPU infrastructure used to train and serve AI models, rather than renting it entirely from foreign clouds (AWS, Azure, Google Cloud) that are subject to foreign jurisdiction, pricing decisions, and potential access restrictions.

Model independence: Having domestic AI models — trained on Indian data, in Indian languages, by Indian institutions — rather than being entirely dependent on models from US or Chinese providers that could be suspended (as the Fable 5 ban just demonstrated), priced arbitrarily, or restricted in ways India cannot control.

Data sovereignty: Controlling where Indian citizens' data is stored, processed, and used as AI training material, rather than having it flow to foreign servers governed by foreign privacy law.

Regulatory autonomy: Having a domestic legal and governance framework for AI, rather than adopting wholesale the EU's approach, the US's approach, or China's approach — each of which reflects those countries' political values and economic interests, not India's.

India cares about all four dimensions, but not equally. The most pressing drivers are strategic: the Fable 5 ban, in which the US government forced a US company to shut off a global service, demonstrated concretely what dependency on foreign AI providers can cost. The AI landscape in which India finds itself — a superpower competition between American techno-liberalism and Chinese techno-authoritarianism — gives India a strong incentive to build enough indigenous capability to negotiate from strength rather than dependency.


The IndiaAI Mission: What Has Actually Been Built

The IndiaAI Mission is the centrepiece of India's sovereign AI strategy. Sanctioned at Rs 10,371.92 crore ($1.25 billion USD) by the Cabinet in March 2024, it has seven pillars: AI compute, foundation models, datasets, application development, AI safety, startup support, and skills development.

The Compute Stack

The most concrete achievement is compute. As of mid-2026, the IndiaAI Mission has deployed approximately 34,000 GPUs across data centers in India, accessible to registered startups, academic researchers, and government agencies at approximately ₹65 per GPU-hour — a fraction of what H100 time costs on AWS or Azure.

Ashwini Vaishnaw announced at the AI Impact Summit that India will add another 20,000 GPUs to reach 54,000 in the near term, with a target of 100,000 public GPUs by December 2026. Private deployments — by Reliance, Tata, and international hyperscalers building India-based infrastructure — are expected to match or exceed that figure, pushing national GPU capacity past 200,000 by year-end.

The composition of those 34,000 GPUs: NVIDIA H100s, H200s, and the newer Blackwell-generation cards. Not a single chip is domestically designed or manufactured.

The Dataset Platform

Alongside compute, IndiaAI has built an open Indian dataset platform — a government-curated repository of public-sector data (agricultural records, health records, court judgments, financial transaction statistics) made available for AI training under structured licensing. The platform is designed to operationalize what policymakers call India's data dividend: the structural advantage of having one of the world's largest populations generating one of the world's largest volumes of digital interactions, in one of the world's most linguistically diverse environments.

The platform currently holds curated datasets across 22 scheduled languages, agricultural records from 140 million farmer accounts, 7 billion+ health records from government healthcare schemes, and financial data from UPI's 14 billion monthly transactions. The quality of this data — whether it is well-labeled, appropriately licensed, and actually useful for training — is the more contested question.


The Indigenous Models: Sarvam, Krutrim, and Bhashini

The most visible output of the IndiaAI Mission's model pillar is Sarvam AI, a Bengaluru-based startup that won a government tender to build India's first homegrown LLM. In February 2026, Sarvam open-sourced two foundation models trained entirely on IndiaAI Mission compute:

Sarvam 30B and Sarvam 105B

Sarvam 30B is a 32-billion-parameter Mixture-of-Experts model with approximately 2.4 billion active parameters per token, with a 65K context window. It is designed for speed — positioned comparably to Gemini 3.1 Flash-Lite and GPT-5 mini in terms of capability tier.

Sarvam 105B is a 106-billion-parameter MoE model with approximately 10 billion active parameters per token, with a 128K context window. It targets complex, multi-step tasks — positioned in the same tier as Gemini 3.1 Pro and GPT-5.4. Both models are open-sourced under Apache 2.0.

Where they lead: On Indian-language benchmarks — IndicBench and similar evaluation suites — Sarvam 105B wins approximately 90% of comparisons across all dimensions against GPT-4, Claude, and Gemini, with an 84% win rate on STEM, mathematics, and coding specifically in Indian languages. For Sarvam 30B, the numbers are 89% and 87% respectively. These are not marginal wins; they reflect a structural advantage from training on Indian-language data at a depth that no Western frontier model has matched.

Where they trail: On English-centric global benchmarks, the picture reverses. On the Artificial Analysis Intelligence Index, Sarvam 105B scores 18 — behind GLM-4.5-Air (23), Mistral Small 4 (27), and well behind the frontier GPT and Claude models. On TerminalBench Hard — a benchmark for autonomous coding — Sarvam 105B scores 1.5% and Sarvam 30B scores 2.3%, compared to GLM-4.5-Air's 20.5%. These numbers reflect the reality that Sarvam's training prioritized Indian-language breadth over raw English-language frontier reasoning.

The honest framing: Sarvam is not competing with Claude Fable 5 or GPT-5.6 on frontier benchmarks. It is competing with GPT-4-class models on Indian tasks — and winning. For India's specific use case (governance, citizen services, multilingual applications), this is the right tradeoff.

Krutrim

Krutrim — meaning "artificial" in Sanskrit — was founded by Ola's Bhavish Aggarwal in April 2023 and became India's first AI unicorn within six months of launch. Krutrim-3, the current flagship model, was trained on over 2 trillion tokens with strong support for all 22 Indian languages.

Unlike Sarvam's government-backed approach, Krutrim is a private venture that competes commercially. It has built a GPU cloud (Krutrim Cloud) offering H100 time to Indian enterprises, and is investing in inference infrastructure for real-time Indian-language applications. Krutrim-3 benchmarks similarly to Sarvam 105B on Indian-language tasks, with marginally stronger performance on code generation.

Bhashini

Bhashini is a government initiative — not a private startup — designed to build translation and speech infrastructure for Indian languages. It is the layer beneath the LLMs: real-time translation, voice-to-text, and text-to-voice across all 22 scheduled languages, accessible via API for any developer building on Indian infrastructure.

Bhashini-v2, launched in early 2026, dramatically improved accuracy in tribal and regional language variants. It is already deployed in the MyGov platform (140 million users), CoWIN, and several state government portals. For citizens who do not read or speak English or Hindi as a primary language, Bhashini is the most tangible expression of sovereign AI: government services accessible in Gondi, Bodo, Maithili, Santali — languages that no global frontier model serves with meaningful accuracy.


The Data Dividend: India's Structural Advantage

Every sovereign AI strategy needs a distinctive input that it has more of than its competitors. For the US, it is capital and research talent. For China, it is scale of domestic data generation and state-directed infrastructure. For India, it is something more specific: linguistic diversity at population scale, combined with an unusually large structured digital transaction dataset from its government technology stack.

The 22-Language Advantage

India has 22 constitutionally recognised languages and over 700 spoken dialects. The largest — Hindi, Bengali, Telugu, Marathi, Tamil, Urdu, Gujarati, Kannada, Malayalam, Odia — have populations of speakers that individually exceed most European countries. Training AI that works for India's population means training AI that works across this spectrum, not just in English and Hindi.

Western frontier models were trained primarily on English-language internet data, with modest representation of high-resource non-English languages. Indian languages — particularly those with non-Latin scripts, agglutinative grammar, and rich oral traditions — are systematically underrepresented. A model trained specifically on Indian-language data at volume has a structural advantage that cannot easily be replicated by fine-tuning a Western model.

The UPI/Aadhaar/DigiLocker Stack

India's digital public infrastructure — the "India Stack" — has generated a dataset that no other country has: 14 billion UPI transactions per month, 1.4 billion Aadhaar-verified identity records, 6 billion+ DigiLocker document verifications, and 140 million registered farmers on the PM-Kisan digital ledger.

This data represents an unprecedented training resource for financial AI (fraud detection, credit scoring, payment optimization), identity verification AI, agricultural AI, and healthcare AI. Properly licensed and curated, it gives Indian AI models a structural edge in applications that matter most for India's actual economy — not writing essays or generating code for Silicon Valley startups, but managing agricultural loans, verifying insurance claims, and processing government subsidies in a country where 800 million people interact with government services digitally.

The caveat: most of this data cannot simply be exported into training runs without complex privacy, consent, and legal frameworks. The DPDP Act governs how personal data can be used. The path from "India has this data" to "Indian AI models are trained on this data legally and effectively" involves years of institutional work that is still in progress.


The NVIDIA Problem: Sovereignty on a Foreign Foundation

The most pointed critique of India's sovereign AI strategy comes from a single observation: all of it runs on NVIDIA chips.

The 34,000 GPUs in the IndiaAI Mission are NVIDIA H100s and H200s. Krutrim Cloud runs NVIDIA hardware. Every Indian AI startup that has received IndiaAI compute subsidies is being trained to want — and depend on — NVIDIA's next GPU generation. The Rs 10,371 crore investment in sovereign compute is, in one reading, a Rs 10,371 crore investment in NVIDIA's revenue line.

This matters geopolitically because of precedent. The US export controls on AI chips — introduced in October 2022 and progressively tightened — cut China off from NVIDIA H100s and their successors. The controls are implemented by the US Commerce Department, the same body that just issued the Fable 5 directive against Anthropic. The legal authority that cut off China's chip supply is the same authority that could, in a different political moment, cut off India's.

India's government is aware of this risk. The US-India interim trade framework announced in February 2026 includes specific language protecting India's access to advanced AI chips — a measure that is more reassuring than nothing, but is also a trade agreement that can be renegotiated, not a constitutional protection. One article from The Ken captured it precisely: "India called its AI sovereign. The US government can still access it."

The Domestic Chip Gap

India has no AI chip manufacturer. The country's semiconductor ambitions — the India Semiconductor Mission, launched in 2021, with $10 billion committed — are focused on assembly and packaging in the near term, with fabrication a decade away at best. TATA Electronics is building a chip assembly facility in Assam, but assembling chips is not the same as designing or fabricating advanced AI accelerators.

The uncomfortable arithmetic: India needs 100,000 GPUs for its 2026 target, plans 200,000+ by 2027, and every single one of those chips will be designed in the US (NVIDIA, AMD, Intel) or Taiwan (TSMC). If export controls tighten — whether due to India-specific factors or because US-China escalation produces a blanket restriction on advanced chips to all non-allied nations — India's sovereign AI compute base becomes as vulnerable as China's.

The Power Bottleneck

Even assuming chip supply is secured, India's AI data center expansion faces a physical constraint: power. GPU clusters at the scale of 100,000 H100s require approximately 300 megawatts of continuous power — the equivalent of a medium-sized city's residential consumption. India's data center energy mix is currently 70%+ thermal (coal and gas). The government's target to expand GPU infrastructure to 200,000+ nodes by 2027 implies power requirements that the existing grid cannot meet without significant new generation capacity.

Data center operators are increasingly colliding with this bottleneck: GPU integration demands 7-8 times higher power density than traditional server racks, and most existing Indian data centers cannot handle the requirements without major retrofits. The IndiaAI Mission's compute targets and India's power infrastructure are not currently on compatible trajectories.


India's Regulatory Framework: Light Touch with Edges

India has not passed a comprehensive AI Act. This is a deliberate choice, not an oversight.

The government's position — articulated by MeitY through multiple white papers and the AI Governance Guidelines released in early 2026 — is that prescriptive legislation at this stage would stifle innovation and lock in regulatory categories before the technology is mature enough to define them. India has watched the EU struggle to implement the AI Act in a rapidly evolving landscape and has chosen a different path.

The Seven-Sutra Framework

MeitY's AI Governance Guidelines are built around seven principles (called "sutras" to signal the distinctly Indian framing):

  1. Safety and trustworthiness — AI systems must be designed to avoid harm, with mechanisms for human oversight
  2. Equality and non-discrimination — AI must not systematically discriminate based on protected characteristics
  3. Inclusivity and accessibility — AI must serve all of India's population, including those in non-English languages and low-literacy contexts
  4. Privacy and data protection — AI must comply with the DPDP Act framework
  5. Transparency — AI decision-making affecting individuals must be explainable
  6. Accountability — there must be identifiable human or organizational responsibility for AI outcomes
  7. Protection and reinforcement of positive human rights — AI must not undermine constitutional rights

The framework is principle-based: it tells you what values AI must uphold, not which specific technical implementations are required. This gives India flexibility to enforce against egregious violations while not constraining the development approach of compliant builders.

The DPDP Act

The Digital Personal Data Protection Act — passed in 2023, with rules still being finalized in 2026 — is India's primary legal instrument governing how personal data can be collected, processed, stored, and used in AI training. Its key provisions for AI:

  • Data fiduciaries (companies that collect personal data) must get explicit consent for using data in AI training
  • Cross-border data transfers are governed by a government-maintained whitelist of permitted countries; countries can be blacklisted
  • Citizens have the right to know what data is held about them and to request deletion
  • Significant data fiduciaries — large platforms — face enhanced obligations including data protection impact assessments

The cross-border transfer provision is where data sovereignty intersects with AI: India can, under the DPDP Act, prohibit data transfers to countries it designates as unsafe. This gives the government a legal instrument to force localization of training data, though it has not yet used this power aggressively.

The 2026 IT Rules Amendment

The most recent regulatory addition is the 2026 IT Rules Amendment, which came into effect February 20, 2026. It shifts the regulatory model for AI-integrated platforms from reactive "due diligence" (platforms must respond when informed of violations) to proactive "Active Moderation" (platforms must monitor and intervene on AI-generated content that violates Indian law).

The practical implication: global AI platforms serving Indian users — ChatGPT, Gemini, Claude — must implement moderation systems that actively monitor outputs for content that violates Indian law (disinformation, content critical of national symbols, content deemed to threaten national security). This is less aggressive than China's requirements but more demanding than the EU's AI Act's approach to foundation models.

The AI Safety Institute

India established an AI Safety Institute (AISI) in late 2025, modeled loosely on the UK's AISI but with a distinctly different mandate. The Indian AISI focuses on:

  • Evaluating frontier models against safety benchmarks before they can be deployed in Indian government applications
  • Developing India-specific red-teaming capabilities for Indian-language harm contexts
  • Building bilateral relationships with AI safety bodies in the US (US AISI), UK (UK AISI), and EU

The AISI has published early evaluation frameworks but has not yet conducted the kind of high-stakes, results-published evaluations that would make it a significant force in the global AI safety conversation.


India's Geopolitical Position: The Third Way

The most strategically significant dimension of India's sovereign AI push is not technical — it is political.

India sits at the intersection of what analysts call two competing AI governance paradigms:

American techno-liberalism: AI developed primarily by private companies, minimally regulated, optimized for commercial performance, exported globally as a soft-power projection of American values (free speech, market competition, democratic norms).

Chinese techno-authoritarianism: AI developed under state direction, tightly regulated for political content, used for domestic surveillance and control, exported to authoritarian-aligned states as part of the Belt and Road Digital Silk Road.

India has explicitly rejected both models — not as a middle path between two extremes, but as a distinct alternative rooted in its own political identity: a democracy with a large Muslim minority, 22 languages, a constitutional commitment to equality, and a historical tradition of non-alignment.

India's pitch to the Global South is specific: we are building AI that works for populations that neither American nor Chinese AI was designed for. AI that works in Swahili, Bahasa, Portuguese, and Hindi as naturally as in English. AI that is governed by democratic institutions accountable to citizens, not by private corporations accountable to shareholders or party committees accountable to a single-party state.

This positioning is strategically valuable. The 130+ countries that constitute the Global South are deciding, right now, which AI infrastructure to build on, which models to deploy, which governance frameworks to adopt. India's offer — democratic, multilingual, sovereign-infrastructure-friendly, not entangled in either Washington's or Beijing's geopolitical orbit — is a genuine differentiator.

Whether India can actually deliver on this positioning depends on the gap between aspiration and execution — a gap that is currently substantial.


The Gaps: What "Sovereign AI" Does Not Yet Mean for India

Honest accounting of where India is not:

No frontier model: Sarvam 105B is competitive with mid-tier global models on Indian-language tasks. It is not competitive with GPT-5.6, Claude Fable 5, or Gemini 3.1 Pro on the frontier reasoning benchmarks that define global AI capability. The gap between the best Indian model and the global frontier is roughly three to four years of compute-scaling advantage. Closing it requires either an order-of-magnitude increase in training compute or architectural breakthroughs that are not currently on the roadmap.

No domestic chip: As discussed, India's compute base is 100% imported hardware. This is the foundational vulnerability in the sovereign AI claim. Compute sovereignty requires either domestic chip design and manufacture — a decade-plus project at minimum — or guaranteed access treaties that are robust to US domestic political change.

No frontier training at scale: The models India has trained — Sarvam, Krutrim, Bhashini — were trained on relatively modest compute budgets by frontier standards. Sarvam 105B was trained on IndiaAI Mission compute (a few thousand GPUs for a few months). Training a model competitive with GPT-5.6 requires cluster-months of 10,000+ H100-equivalent GPUs — compute that India does not yet have at that scale, and that would cost hundreds of millions of dollars even if the hardware were available.

Data quality, not just quantity: The data dividend argument is correct in principle but overstated in practice. The 14 billion UPI transactions and 1.4 billion Aadhaar records are not neatly packaged AI training datasets. They are operational systems records, largely unstructured, subject to privacy law, and requiring years of curation, labeling, and licensing work before they become training-ready. The gap between "India has this data" and "India's models are trained on this data" is measured in years of institutional effort.

Brain drain: India's most technically skilled AI researchers disproportionately end up at Google, Meta, Microsoft, or OpenAI in the United States. This is partly a compensation problem (the gap between Silicon Valley salaries and Indian tech salaries remains large), partly a research culture problem (the best research environments are still concentrated in the US and UK), and partly a visa problem (the H-1B bottleneck for Indian nationals into the US has paradoxically kept some talent in the US rather than returning to India). The IndiaAI Mission's startup grants and compute subsidies help at the margins, but they do not yet reverse the structural brain drain.


What Comes Next: The 2026–2028 Window

The trajectory is clear even if the destination is not. India's sovereign AI position in 2026 is a foundation, not an arrival. The next two years will determine whether that foundation becomes a genuinely capable indigenous AI sector or remains a well-intentioned but structurally dependent infrastructure project.

The key variables to watch:

Chip supply guarantee: Can India secure a binding, long-term guarantee of access to NVIDIA H100/B200 equivalents that survives changes in US administrations? The interim trade deal is a start, not an answer.

Compute scale-up: Does India hit 100,000 GPUs by December 2026? And does it resolve the power infrastructure bottleneck that threatens to cap expansion well below the headline targets?

Model generations: Can Sarvam or a competitor produce a Sarvam 2.0 that is competitive on English-centric global benchmarks while maintaining its Indian-language advantage? The 2027 model generation will be the real test.

Global South adoption: Does any significant Global South government — Nigeria, Brazil, Indonesia, Bangladesh — deploy Indian AI models as national infrastructure? Adoption abroad would validate the "third way" positioning.

Domestic chip progress: Does the India Semiconductor Mission produce any domestic chip design capability, even at the level of inference accelerators optimized for Indian-language models, by 2028? The bar here is low — any meaningful domestic silicon would be a strategic signal even if it is far from frontier.

Regulatory maturity: Does India's AI governance framework move from principles to enforcement? The seven sutras are a foundation; binding rules with actual penalties are what create real accountability.


The Bottom Line

India's sovereign AI status in June 2026 is real, partial, and fragile in specific ways.

Real: India has built GPU compute infrastructure accessible to startups and researchers at subsidized rates. It has open-sourced competitive models in Indian languages. It has a data governance framework that gives it legal authority over citizen data. It has positioned itself as the AI leader of the Global South. These are genuine achievements.

Partial: India does not have a frontier AI model competitive with US or Chinese leaders. It does not have a domestic chip. Its compute base is entirely dependent on US-designed and US-manufactured hardware. Its AI Safety Institute is nascent. Its data dividend remains largely theoretical — legally complex and operationally uncurated.

Fragile in specific ways: The entire IndiaAI Mission compute base runs on hardware subject to US export control. A change in US-India relations, a broadening of chip export restrictions, or a geopolitical event involving India's neighbors could cut off chip supply in the same way Chinese access was cut off in 2022. India's sovereignty, as currently constructed, is contingent on US goodwill. That is not sovereignty in the full sense — it is a significantly better dependency arrangement than pure commercial cloud reliance, but it is still dependency.

The Fable 5 ban, happening the same week this assessment is being written, is a live demonstration of what dependency costs. Anthropic's users — including Indian users and Indian enterprises — lost access to the most capable AI model in the world overnight, because the US government issued a directive to a US company. India's response to that ban is to redouble investment in indigenous capacity. The logic is right. The execution is two to three years behind what the aspiration implies.

Sovereign AI is the right goal. India is on the right path. The path is longer and more technically demanding than the summit announcements suggest.


FAQ

What has India actually built under the IndiaAI Mission?

34,000+ GPUs deployed and accessible at ₹65/hour, the Sarvam 30B and 105B open-source models, the Bhashini translation platform in 22 languages, an open Indian dataset platform, and an AI Safety Institute for frontier model evaluation.

Is Sarvam 105B competitive with ChatGPT or Claude?

On Indian-language tasks: yes, significantly. On English-centric frontier benchmarks: no — it scores 18 on the Artificial Analysis Intelligence Index versus 64+ for Claude Fable 5. It targets a different problem: serving India's population in their actual languages, not competing on Silicon Valley benchmarks.

What is the NVIDIA dependency risk for India?

If US export controls are extended to restrict NVIDIA chip sales to India — as they were for China in 2022 — India's entire sovereign AI compute base would face supply disruption. The 2026 interim US-India trade deal provides some protection, but it is a trade agreement, not a permanent guarantee.

Can India develop its own AI chips?

Not at frontier scale in the near term. The India Semiconductor Mission is targeting assembly and packaging first, with fabrication a decade away. The realistic near-term goal is custom inference accelerators optimized for Indian-language models, not training chips competitive with NVIDIA's H-series.

What is the significance of Bhashini?

Bhashini is the infrastructure layer that makes AI accessible to India's 800 million non-English and non-Hindi speakers. It provides real-time translation, voice-to-text, and text-to-voice across 22 scheduled languages and hundreds of dialects — deployed in government services that are already used by hundreds of millions of citizens.


Sources: IndiaAI Mission | Sarvam 30B and 105B release | Artificial Analysis benchmarks | NVIDIA IndiaAI coverage | EY Sovereign AI India | The Ken: India called its AI sovereign | Rest of World: India frugal AI startups

Related posts