· Antonio Leiva · ai  · 7 min read

Why I Don’t Think Token Subsidies Are Going Away

Lately I keep hearing the same idea: this moment of “subsidized tokens” is going to end.

That OpenAI and Anthropic are letting us use ChatGPT, Codex, and Claude at prices that make no sense compared to the API, but only for now. And that once we’re all hooked, we’ll have to pay the “real price.”

I don’t buy it.

I’m not saying limits, plans, and conditions won’t change. Of course they will. I’m also not saying they’ll give away infinite intelligence forever. That would be naive.

What I am saying is something else: I don’t think the subsidy disappears. I think it turns into a structural competitive advantage.

Intelligence gets cheaper, even when the product does not

I think many people are mixing up two different things:

  1. The real cost of producing intelligence
  2. The price of the product we buy

Those are not the same.

Inference gets cheaper over time. Frontier models are still expensive, yes. But the models just below them are already dramatically cheaper than equivalent-quality models were one or two years ago. That’s the same pattern we see in tech over and over again: today’s premium becomes tomorrow’s commodity.

That’s why I don’t think the most likely scenario is “the party is over, now everyone pays API prices.”

I think the more likely scenario looks like this:

  • the internal cost of serving models keeps dropping
  • companies push us to use them for more and more tasks
  • and the visible price of premium subscriptions does not move much

In other words: they won’t kill the subsidy. They’ll make it economically sustainable.

That is a very different story.

The business is not selling tokens. It’s selling habit

OpenAI and Anthropic are not only building models. They are building products.

And products are not optimized so you understand the marginal cost of each request. They are optimized so that:

  • they are useful every day
  • they become part of your workflow
  • you depend on them more and more
  • and upgrading to the next plan feels reasonable

That’s why I don’t think the endgame is charging everyone API rates directly. I think the endgame is making higher tiers feel increasingly justified.

Right now we already have plans like 20 and 200. My impression is that the goal is not to destroy that structure, but to make the higher tier feel like the obvious choice for more people, especially programmers, technical teams, and heavy users.

That fits the software business much better.

They are not really selling units of cost.
They are selling a value proposition you want to keep paying for.

They also can’t get too greedy

There is another important point here.

Even if they wanted to squeeze pricing hard, they are not alone in the market.

OpenAI and Anthropic are in a brutal fight. Google is still there. And Chinese models are improving fast with a quality-to-price ratio that makes arrogance expensive.

On top of that, there is a slower but very real pressure: local models keep getting better. They are not ready to replace everything in most workflows, but they do create a competitive floor. More importantly, they keep the idea of a “reasonable price” tied to reality.

Today, many people do not seriously consider switching to cheaper models because frontier models still justify their cost and the total spend is still manageable.

But if prices really jumped, that would change quickly.

People would start to:

  • try Chinese alternatives
  • optimize workflows to consume less
  • use cheaper models for more steps
  • and increasingly consider local execution where it makes sense

That’s why I don’t really buy the argument that “once we’re hooked, we’ll pay whatever it takes.”

No. We’ll pay a lot as long as it still feels worth it. The moment it stops being worth it, we’ll look elsewhere.

And they know that.

So, is AI a bubble?

My view is: not in the simplistic sense people often mean.

Is there hype? Of course.
Is there a lot of nonsense? Also yes.
Are companies stuffing “AI” into every slide deck to raise money? Absolutely.

But a real bubble, in the stronger sense, usually rests on huge promises with very poor practical utility.

And in programming, that is no longer what we are seeing.

From November until now, the change has been very real.

I am not talking about benchmarks. I honestly don’t care that much about them.

I’m talking about day-to-day use. What happens when you actually sit down and work.

A few months ago, the model fit well as a copilot. Useful, yes. Sometimes brilliant. But still fragile, still dependent, still too prone to requiring constant micro-management.

Now we are entering a different phase.

I am not saying it can replace us or that we disappear from the process. But we are starting to assume that we can give it much more room, let it solve larger chunks, orchestrate it, supervise it, and correct it instead of dictating every keystroke.

That is no longer just a copilot.
It is starting to look more like a delegate.

And that is not a cosmetic change. It is a shift in regime.

The “we already hit the ceiling” argument does not match reality

Lately I also hear a lot of talk about LLMs having hit their ceiling. That Transformers already gave us everything they had. That the big leap is behind us.

I don’t see practical evidence for that, at least not in programming.

If we had truly hit a functional ceiling, we would not have seen this kind of jump in real-world usefulness in just a few months.

And there is an important nuance here: part of the jump probably does come from better models. But another part comes from something just as important, maybe more:

  • better training for development tasks
  • better tools
  • better context handling
  • better scaffolding
  • better ways of maintaining state
  • better workflows around the model

So even if someone wanted to argue that the “pure model” is not improving that much, they would still have a problem.

Because the complete system clearly is.

And what we actually use is never the model in isolation. We use the whole system.

Even if models froze today, we would still have years of progress ahead

This is the most important point for me.

Imagine that tomorrow progress on the base model stopped. No better GPT. No better Claude. No better Gemini. We stay exactly where we are today.

Would practical progress in AI-powered work stop because of that?

I don’t think so. Not even close.

I think we are still very far from extracting the maximum value from the models we already have.

And that margin does not live only in the model’s “intelligence.” It lives in everything around it:

  • how we provide context
  • how we split tasks
  • how we validate outputs
  • how we do retries
  • how we integrate tools
  • how we design interfaces
  • how we build memory and state
  • how we organize human work around the agent

Confusing the limit of the model with the limit of the system is one of the biggest mistakes of this stage of AI.

A model may improve by 15%. But a well-designed system around that same model can multiply the real value you get out of it by much more than that.

That is why I still think we are early.

Not because “everything will magically keep getting better,” but because we are still learning how to build properly around these capabilities.

My bet

My bet today is simple:

  • token subsidies are not going away
  • they will become part of the product
  • AI is not close to hitting its practical ceiling in programming
  • and we are nowhere near the practical limit of what we can do with LLMs

What I do think is happening is that we are leaving behind a very naive first phase.

The phase of “look how impressive this demo is.”

And we are entering a much more interesting one:

the phase where the real question is no longer whether this works, but how to design systems that squeeze the most out of it.

And honestly, I think we still have years of progress ahead there.

Back to Blog