Sarvam AI is building India's sovereign AI stack — not a single model, but a full product layer spanning chat LLMs, speech recognition, text-to-speech, translation, and document intelligence, all optimized for the way Indian languages are actually used: native script, romanized WhatsApp Hindi, code-mixed Hinglish, and 22 scheduled languages.
In March 2026, Sarvam open-sourced Sarvam-30B and Sarvam-105B — MoE reasoning models trained from scratch on IndiaAI Mission compute. Both are already in production: Sarvam 30B powers Samvaad (conversational agent platform), Sarvam 105B powers Indus (AI assistant for complex reasoning and agentic workflows).
This guide maps every model, API, pricing tier, and integration path — so you can pick the right Sarvam capability for your use case without reading six separate doc pages.
Quick reference: the Sarvam stack
| Model | ID | What it does | Languages | Best for |
|---|---|---|---|---|
| Sarvam-105B | sarvam-105b | Flagship chat LLM (MoE + MLA) | 10 Indic + English | Reasoning, agents, long docs |
| Sarvam-30B | sarvam-30b | Efficient chat LLM (MoE + GQA) | 10 Indic + English | Voice agents, high-throughput chat |
| Saaras v3 | saaras:v3 | Speech-to-text | 23 (22 Indic + English) | Call analytics, voice agents, telephony |
| Bulbul v3 | bulbul:v3 | Text-to-speech | 11 (10 Indic + English) | IVR, narration, voice agents |
| Sarvam-Translate | sarvam-translate:v1 | Formal translation | All 22 official Indic + English | Official docs, 22-language coverage |
| Mayura | mayura:v1 | Colloquial translation | 11 Indic + English | Code-mixed, conversational text |
| Sarvam Vision | sarvam-vision | Document intelligence (OCR) | 23 (22 Indic + English) | Scanned archives, table extraction |
Deprecated: sarvam-m (24B hybrid) — migrate to sarvam-30b or sarvam-105b.
SDK: pip install sarvamai · npm install sarvamai · docs.sarvam.ai
Free credits: ₹100 on signup · Pricing
Company and positioning
Sarvam AI was founded in 2023 in Bengaluru by Vivek Raghavan and Pratyush Kumar. It was selected under India's IndiaAI Mission to build the country's first homegrown LLM stack — trained entirely on Indian compute with datasets emphasizing Indian languages, code-mixed text, and culturally grounded content.
The strategic bet: unified multimodal models from Western labs treat Indian languages as secondary. Sarvam builds specialized foundations for multilingual India — the same thesis Ideogram applies to design typography, applied here to speech, script diversity, and romanized colloquial usage.
For sovereign AI policy context, see our India Sovereign AI Status 2026 post. This guide focuses on product capabilities and developer integration.
Chat LLMs: Sarvam-30B and Sarvam-105B
Both models are reasoning models trained from scratch — not fine-tunes of Mistral, Qwen, or Llama. Architecture: Mixture-of-Experts Transformer with 128 sparse experts, sigmoid-based routing, and in-house RL (async GRPO with CISPO-inspired policy optimization).
Sarvam-105B (flagship)
| Spec | Value |
|---|---|
| Total parameters | 105B+ MoE |
| Attention | Multi-head Latent Attention (MLA) |
| Active params | ~10B per token |
| Context window | 128K tokens |
| Pre-training | 12T tokens |
| License | Apache 2.0 |
| Powers | Indus AI assistant |
Benchmark highlights (from Sarvam's blog):
| Benchmark | Sarvam-105B |
|---|---|
| Math500 | 98.6 |
| AIME 25 (w/ tools) | 88.3 (96.7) |
| MMLU | 90.6 |
| LiveCodeBench v6 | 71.7 |
| BrowseComp | 49.5 |
| Tau2 (avg.) | 68.3 (highest in comparison set) |
| SWE-Bench Verified | 45.0 |
| Indian language win rate | ~90% pairwise |
Sarvam 105B leads on agentic benchmarks — BrowseComp and Tau2 — reflecting training on tool interaction, web search, and multi-step environments. On Indian-language pairwise evals, it wins ~90% of comparisons across fluency, script correctness, usefulness, and verbosity.
Sarvam-30B (efficient)
| Spec | Value |
|---|---|
| Total parameters | 30B MoE |
| Active params | 2.4B per token |
| Attention | Grouped Query Attention (GQA) |
| Context window | 64K tokens |
| Pre-training | 16T tokens |
| License | Apache 2.0 |
| Powers | Samvaad conversational platform |
| Inference | H100, L40S, Apple Silicon (MXFP4) |
| Benchmark | Sarvam-30B |
|---|---|
| Math500 | 97.0 |
| HumanEval | 92.1 |
| LiveCodeBench v6 | 70.0 |
| AIME 25 (w/ tools) | 80.0 (96.7) |
| BrowseComp | 35.5 |
| Indian language win rate | ~89% pairwise |
Sarvam 30B is optimized for real-time deployment — Sarvam reports 3–6× throughput vs Qwen3 baseline on H100, and runs locally on MacBook Pro M3 via MXFP4.
Choosing between them
| Need | Model |
|---|---|
| Voice-agent pipeline, low latency | Sarvam-30B |
| Multi-step reasoning, tool use, long docs | Sarvam-105B |
| Local/edge inference on laptop | Sarvam-30B (MXFP4) |
| Maximum Indian-language quality | Sarvam-105B |
| Cost-sensitive high-volume chat | Sarvam-30B (₹2.5/1M input vs ₹4) |
Claude for Work
Use Claude as a thought partner for writing, research & decisions — no coding required. 2 live sessions with Yash Thakker.
Claude for Work is a 2-day live workshop on using Claude to supercharge your daily work — writing, research, analysis, and decision-making — without any coding required. Learn how to set up Claude Projects with custom instructions, run deep-research sprints, co-write documents that sound like you, and build repeatable prompt systems for your team. August 1–2, 2026. Hosted by Yash Thakker, founder of AISOLO Technologies, instructor to 350,000+ students.
Includes 1-year access to all session recordings, a personal prompt library, Discord community access, and a certificate of completion. No coding or technical background required. Designed for managers, marketers, founders, and writers.
API integration (OpenAI-compatible)
from sarvamai import SarvamAI
client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
response = client.chat.completions(
model="sarvam-105b",
messages=[
{"role": "user", "content": "Explain GST impact on Indian MSMEs in Hindi."}
],
temperature=0.5,
max_tokens=2000,
)
print(response.choices[0].message.content)
curl -X POST https://api.sarvam.ai/v1/chat/completions \
-H "api-subscription-key: $SARVAM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Explain GST impact on Indian MSMEs."}],
"model": "sarvam-105b",
"temperature": 0.5,
"max_tokens": 2000
}'
Streaming is supported. Reasoning mode is on by default (reasoning_effort: low) — reasoning tokens count toward max_tokens. Increase max_tokens or set reasoning_effort=None to disable.
Weights: Hugging Face 30B · Hugging Face 105B · run with Transformers, vLLM, or SGLang.
Speech: Saaras v3 (ASR)
Saaras v3 is Sarvam's speech-to-text model — state-of-the-art ASR for Indian accents, code-mixed speech, and telephony audio (8 kHz).
| Spec | Value |
|---|---|
| Model ID | saaras:v3 |
| Languages | 23 (22 Indic + English), auto-detect |
| REST limit | 30 seconds per request |
| Batch limit | Up to 2 hours per file |
| Protocols | REST, Batch, WebSocket streaming |
Five output modes
| Mode | Output |
|---|---|
transcribe | Text in source language |
translate | Translated text (typically to English) |
verbatim | Word-for-word including fillers |
translit | Transliterated script |
codemix | Preserves code-mixed structure |
Example
from sarvamai import SarvamAI
client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
response = client.speech_to_text.transcribe(
file="audio.wav",
model="saaras:v3",
language_code="hi-IN",
mode="transcribe",
with_timestamps=True,
)
print(response.transcript)
Batch API supports speaker diarization (diarized_transcript) — ideal for call center analytics, meetings, and long-form media.
Pricing: ₹30/hour (transcribe) · ₹45/hour (with diarization) · billed per second, rounded up.
Best for: Voice agents, IVR analytics, 8 kHz telephony, Hinglish/code-mixed call recordings.
Speech: Bulbul v3 (TTS)
Bulbul v3 converts text to natural-sounding speech across Indian languages.
| Spec | Value |
|---|---|
| Model ID | bulbul:v3 |
| Languages | 11 (10 Indic + English) |
| Speakers | 30+ voices (Shubh, Priya, Aditya, Ritu, Anand, …) |
| Max chars | 2,500 per REST request |
| Sample rates | 8–48 kHz (48 kHz REST/WebSocket only) |
| Pace control | 0.5×–2.0× |
Example
from sarvamai import SarvamAI
from sarvamai.play import play
client = SarvamAI(api_subscription_key="YOUR_SARVAM_API_KEY")
response = client.text_to_speech.convert(
text="आपका ऑर्डर confirm हो गया है।",
target_language_code="hi-IN",
model="bulbul:v3",
speaker="priya",
speech_sample_rate=24000,
)
play(response)
Critical limitation: Romanized Indic input degrades quality significantly. Always use native script for Indic words — e.g. "आपका order confirm हो गया है" not "Aapka order confirm ho gaya hai".
Pricing: ₹30/10K characters (v3 beta) · ₹15/10K (v2 legacy).
Protocols: REST, HTTP streaming, WebSocket — for real-time voice agent pipelines pair Bulbul (TTS) + Saaras (ASR) + Sarvam-30B (LLM).
Translation: Sarvam-Translate vs Mayura
Two translation models serve different styles:
| Sarvam-Translate | Mayura | |
|---|---|---|
| Model ID | sarvam-translate:v1 | mayura:v1 |
| Languages | All 22 official Indic + English | 11 Indic + English |
| Max input | 2,000 characters | 1,000 characters |
| Style | Formal only | formal, modern-colloquial, classic-colloquial, code-mixed |
| Script control | No | roman, fully-native, spoken-form-in-native |
| Best for | Government docs, legal, all-language coverage | WhatsApp-style Hinglish, conversational UI |
Sarvam-Translate (formal, 22 languages)
response = client.text.translate(
input="भारत एक महान देश है।",
source_language_code="hi-IN",
target_language_code="gu-IN",
model="sarvam-translate:v1",
)
Open weights available on Hugging Face under Apache 2.0.
Mayura (colloquial + code-mixed)
response = client.text.translate(
input="Your EMI of Rs. 3000 is pending",
source_language_code="en-IN",
target_language_code="hi-IN",
mode="modern-colloquial",
output_script="fully-native",
numerals_format="native",
)
# → "आपका रु. 3000 का ई.एम.ऐ. पेंडिंग है।"
Also available: /transliterate (script conversion without translation) and /detect-language (language ID across all major Indian languages).
Pricing: ₹20/10K characters (translate/transliterate) · ₹3.5/10K (language ID).
Document intelligence: Sarvam Vision
Sarvam Vision is a 3B parameter vision-language model built for Indian-language OCR and document parsing — where global VLMs treat Indic scripts as secondary.
| Spec | Value |
|---|---|
| Model ID | sarvam-vision |
| Parameters | 3B (state-space VLM) |
| Languages | 23 (22 Indic + English) |
| Input | PDF, PNG, JPG, ZIP |
| Output | HTML, Markdown, JSON (structured page data) |
| Max pages | 10 per job |
| Max file size | 200 MB |
Capabilities
- Text extraction with layout and reading order preserved
- Complex tables — merged cells, multi-level headers, invisible borders → clean HTML/Markdown
- End-to-end Indic — Marathi PDF → Marathi structured output (no forced English translation)
Example
job = client.document_intelligence.create_job(
language="hi-IN",
output_format="md",
)
job.upload_file("document.pdf")
job.start()
job.wait_until_complete()
job.download_output("./output.zip")
Pricing: ₹0.5/page · max 10 pages per job.
Best for: Digitizing government records, Indic academic archives, financial reports with complex tables, scanned legal documents.
API pricing summary (INR)
All prices from Sarvam's pricing page:
| Service | Price | Unit |
|---|---|---|
| Sarvam-105B chat | ₹4 / ₹2.5 / ₹16 | input / cached / output per 1M tokens |
| Sarvam-30B chat | ₹2.5 / ₹1.5 / ₹10 | input / cached / output per 1M tokens |
| Speech-to-text | ₹30 | per hour of audio |
| STT + diarization | ₹45 | per hour |
| STT + translate | ₹30 | per hour |
| Sarvam-Translate | ₹20 | per 10K characters |
| Mayura translate | ₹20 | per 10K characters |
| Language ID | ₹3.5 | per 10K characters |
| Bulbul v3 TTS | ₹30 | per 10K characters |
| Document digitization | ₹0.5 | per page |
Rate limits: Starter 60 req/min · Pro 200 · Business 1,000 · Enterprise custom.
Free tier: ₹100 credits on signup to explore all APIs.
Products built on Sarvam models
| Product | Model | Description |
|---|---|---|
| Indus | Sarvam-105B | AI assistant for complex reasoning and agentic workflows |
| Samvaad | Sarvam-30B | Conversational agent platform for real-time multilingual chat |
Both are live in production — the open-source release is not a research preview; these models serve real users today.
Sarvam Startup Program (March 2026): Selected early-stage companies receive 6–12 months of API credits, priority engineering support, and production infrastructure access.
Building a voice agent pipeline
The most common production pattern stacks three Sarvam APIs:
User speech → Saaras v3 (ASR) → Sarvam-30B (LLM) → Bulbul v3 (TTS) → Audio response
Why Sarvam-30B for the LLM layer: 2.4B active parameters = low latency; 64K context handles conversation history; trained on code-mixed Indian language input natively.
For agentic voice (tool calling, web search): swap in Sarvam-105B — 49.5 BrowseComp and 68.3 Tau2 scores reflect strong tool-use training.
For document-heavy workflows (scan a form, extract fields, respond in voice): add Sarvam Vision upstream of the LLM.
Honest benchmark framing
Sarvam's strength is structural, not universal frontier dominance:
Where Sarvam leads:
- Indian-language pairwise evals (~90% win rate for 105B)
- Agentic benchmarks in its class (Tau2, BrowseComp)
- Math/reasoning at model scale (Math500 98.6, AIME 96.7 w/ tools)
- Tokenizer efficiency for Indic scripts (lower cost per Indic token)
- Speech/translation/OCR for 22+ languages
Where Sarvam trails:
- English-centric global frontier benchmarks (Artificial Analysis Intelligence Index ~18 for 105B)
- TerminalBench Hard (~1.5% for 105B vs GLM-4.5-Air ~20%)
- SWE-Bench Verified (45% — competitive but below top coding models)
The honest use case: Indian-language applications, voice agents, document digitization, and sovereign deployment — not replacing Claude Fable 5 for English-only frontier coding.
MCP and agent integration
Sarvam publishes an MCP server at https://docs.sarvam.ai/_mcp/server for Claude Code, Cursor, and other MCP hosts — plus a Meta Prompt in their docs to guide any chat model on using Sarvam APIs effectively.
For wiring into agent harnesses, Sarvam-105B's tool-use training (BrowseComp 49.5, Tau2 68.3) makes it a strong backend for Indian-language agent loops. See our Agent Harness guide for loop architecture.
Getting started checklist
- Sign up at dashboard.sarvam.ai — ₹100 free credits
- Install SDK:
pip install sarvamai - Pick your model from the stack table above
- Test in Playground at docs.sarvam.ai before production
- For self-hosted LLM: download weights from Hugging Face, run with vLLM/SGLang
- For voice agents: Saaras → Sarvam-30B → Bulbul pipeline
- For documents: Sarvam Vision batch API (split PDFs >10 pages)
Summary
Sarvam AI is the most complete India-first AI product stack available in 2026 — not just LLMs, but speech, translation, TTS, and document intelligence trained on Indian compute with open weights on the flagship models.
Three things to remember:
- Two LLMs, two jobs: Sarvam-30B for speed and voice pipelines; Sarvam-105B for reasoning, agents, and maximum quality.
- Translation has two modes: Sarvam-Translate for formal 22-language coverage; Mayura for colloquial and code-mixed Hinglish.
- The moat is Indic depth — ~90% win rate on Indian-language benchmarks, native-script OCR, and code-mixed speech — not English frontier parity.
Related reading
- India Sovereign AI Status 2026 — IndiaAI Mission, compute pool, policy context
- How to Run GLM 5.2 with Every Agent Harness — wiring non-Western LLMs into coding agents
- What Is an Agent Harness? — loop architecture for voice and tool agents
- What Is MCP? — connecting Sarvam MCP to Claude Code and Cursor
- Claude for Work Workshop — hands-on AI skills for professionals
Official sources: Sarvam 30B/105B blog · API docs · Pricing · Models index · Hugging Face
Model specs, API pricing, and benchmark numbers reflect Sarvam's public documentation as of June 20, 2026. Verify current pricing and model availability at docs.sarvam.ai before production deployment.