OpinionPulse AI·

Why India's AI moment depends on cheap inference, not bigger models

India's path to AI leadership isn't by building bigger models like GPT-4, but by mastering cheap, efficient inference for its unique scale and languages.

By Rohan Mehta·7 min read
Share
Why India's AI moment depends on cheap inference, not bigger models
Originally reported by Pulse AI. The summary below is original editorial commentary written by Pulse AI based on publicly available reporting.

I was on a video call last month with a founder friend in Bangalore. He was buzzing, showing me a slick demo his team of three had hacked together over a weekend. It was an AI-powered customer support agent for regional e-commerce, fluent in a convincing mix of Hindi and English. The demo was brilliant. Then he showed me the bill. Just a few hundred queries for their proof-of-concept had racked up an OpenAI API cost that made his eyes water. He laughed, but it was the pained laugh of someone staring at a fundamental mismatch between his ambition and his unit economics. “This is magic, Rohan,” he said, “but it’s not India-scale magic. Not yet.” That conversation has stuck with me for weeks, because it perfectly captures the quiet, unglamorous truth about India’s AI future. While the world is mesmerized by the heavyweight title fight for the biggest, most powerful large language model, our real victory lies somewhere else entirely.

From our vantage point in Delhi or Mumbai, it’s easy to get swept up in the global narrative. Every week, it seems, there’s a new benchmark, a new model that pushes the boundaries of possibility. OpenAI’s GPT-4, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3. It’s a spectacular arms race, a battle of titans fought with mountains of capital and continent-sized clusters of Nvidia’s H100 GPUs. These companies are in a brute-force competition to build digital gods, pouring billions into training runs that consume a small country’s worth of electricity. They are chasing Artificial General Intelligence (AGI) with the raw power of capital, and frankly, it's a jaw-dropping spectacle. The problem is, it’s a game we in India are not equipped to play, and more importantly, a game we shouldn't even try to win.

Let's be brutally honest with ourselves. The kind of capital required to train a frontier model from scratch is astronomical, likely heading into the tens of billions of dollars. We don't have that kind of sovereign or corporate risk capital ready to be incinerated on a single training run. Yes, Reliance Jio is making bold statements about building sovereign AI capabilities in partnership with Nvidia, and that’s a significant and welcome development. But going head-to-head with Microsoft-backed OpenAI or Google's deep research pockets is like showing up to a Formula 1 race with a souped-up Maruti 800. We might be clever and resourceful, but the fundamental physics of capital and computing infrastructure are, for now, not in our favor. Trying to build a bigger, better GPT-4 in India today is a strategic dead end, a vanity project that distracts from our real, asymmetric advantage.

That advantage isn’t in training, but in inference. If training is the multi-billion dollar process of teaching a model about the world, inference is the act of asking it a question. It’s the ‘running’ of the model. And here’s the secret that the big model builders don't shout from the rooftops: over the lifetime of a model, inference accounts for the lion’s share, perhaps as much as 90%, of the total compute cost. This is the operational reality that my founder friend in Bangalore slammed into. The real challenge isn’t building the all-knowing oracle; it's making the oracle's wisdom accessible and affordable enough for a billion people to use every day. The future of AI in India isn't about having the biggest brain, it’s about having the most efficient nervous system.

This is where India’s unique DNA for innovation comes into play. We are the country that mastered the “sachet” model, selling shampoo and coffee for a few rupees to a billion consumers. We are the country that built UPI, a digital payments system that leapfrogged credit cards and processes billions of transactions at near-zero cost. Our genius isn't in creating the most luxurious product, but in engineering a valuable product for an impossibly low cost at unimaginable scale. This is our calling card. AI needs its UPI moment. It needs its sachet moment. The goal shouldn’t be a model that can write a Shakespearean sonnet about quantum physics, but a model that can help a farmer identify crop disease from a photo, in Marathi, for fractions of a paisa per query.

This is precisely the opportunity that a new crop of Indian AI companies is targeting. Look at Sarvam AI. Their OpenHathi series of models isn’t trying to top global leaderboards. Instead, they are deliberately focused on building and fine-tuning models that are performant in Indian languages. Their approach is a tacit admission that simply translating the outputs of an English-centric model is a lossy, culturally inept compromise. True understanding requires being trained on the nuances, the idioms, the very texture of how Indians actually speak. By focusing on fine-tuning powerful open-source models like Meta's Llama series, they are sidestepping the crippling cost of pre-training and focusing directly on creating value for the Indian context. It's a smart, focused strategy that prioritizes utility over size.

Then you have the sheer audacity of Bhavish Aggarwal’s Krutrim. While the initial launch had its glitches, the strategic vision is undeniable and, in my opinion, points in exactly the right direction. Krutrim is not just an LLM; it's a declaration of intent to build a full AI stack, from custom-designed silicon all the way to the end application. The claim of designing their own chips is the most critical piece of this puzzle. Bhavish understands that to truly control your destiny and, more importantly, your unit economics at scale, you cannot be beholden to Nvidia’s pricing or foreign cloud providers' margins. By attempting to build the entire system, he’s making a direct assault on the problem of inference cost. It's an incredibly ambitious, high-risk gambit, but it’s a bet on the right problem: making AI cheap enough for India.

Even the incumbents are thinking along these lines. When Reliance Jio talks about building “India’s own foundational LLM,” the subtext isn’t just about national pride. It’s a cold, hard business calculation. Jio serves over 450 million customers. Imagine the cost if they had to pay a per-query fee to a US company every time one of their users interacted with an AI-powered service on the Jio network. It would be an economic impossibility. Their AI strategy is defensive; it’s about supply chain control for a digital future. Their partnership with Nvidia is less about training the world’s biggest model and more about building the vast, efficient infrastructure needed to serve AI to half a billion people without going broke. It’s a strategy for low-cost, high-volume inference.

Beyond cost, there's the reality of our infrastructure. The dream of a seamless cloud-AI experience, where every query pings a massive data center in Mumbai or Hyderabad, is a fantasy for most of India. Our country lives at the edge. A health worker in rural Odisha, a kirana store owner in a tier-3 town, a logistics manager in a warehouse on the outskirts of Pune—they operate in a world of flaky internet connectivity. For AI to be useful to them, it cannot be dependent on a persistent, high-speed link to the cloud. The models must run on the device itself, on the edge. This is a death sentence for the giant, 1-trillion parameter models. The future for the real India is small, quantized, hyper-efficient models that can run on a standard smartphone or a simple point-of-sale machine, delivering value even when the network is down.

This is why I keep coming back to the UPI analogy. India won in digital payments not by building a better Visa, but by creating a new set of rails—a public digital infrastructure that was mobile-first, interoperable, and radically inexpensive. We didn’t try to replicate the Western model; we leapfrogged it. The blueprint for AI should be the same. Instead of one company building a monolithic, closed-off ‘Indian GPT’, our energy should go into creating the infrastructure for cheap inference. This could mean open-sourcing highly-efficient vernacular models, developing optimized hardware, and building deployment platforms that allow any developer to serve AI to the masses without a crippling AWS bill.

So when I see the breathless headlines about the next great model from Silicon Valley, I don't feel a sense of envy or panic. I feel a sense of clarity. That's their race to run. Ours is different. Our AI champions won't be the ones with the gaudiest performance on a standardized test. They will be the scrappy, brilliant teams who figure out how to deliver a meaningful AI interaction for less than a rupee. They will be the engineers who shrink a powerful model to run on a five-year-old smartphone. India’s AI moment will arrive not when we build a model bigger than GPT-4, but when we make AI so cheap and accessible that it becomes as ubiquitous and vital as UPI. That is the real prize, and it’s a race we are uniquely positioned to win.

Free Newsletter

Get the 50 AI Tools every Indian professional should know in 2026.

One email a week. Free PDF on signup. Unsubscribe anytime.

Why it matters

  • 01India's path to AI dominance is not through competing with the US on training massive LLMs, a capital-intensive race we can't win.
  • 02The real opportunity lies in mastering cheap, efficient inference—the cost of running AI models—to serve a billion users at Indian price points.
  • 03Success will come from vernacular-language fine-tuning, full-stack control over costs, and deploying small models on edge devices for the real, often-disconnected India.
Read the full story at Pulse AI
Share