Chips & Hardware · Report

Analysis of real cost of AI inference reveals subsidies, specialized chips, and sustainability questions around golden age.

Inference economics show margin pressure on cloud operators; pure-software models unprofitable at scale without subsidies or proprietary hardware.

Trade pressSlicast · June 29, 2026 · US · Source: Google News

importance 70

A familiar company recently audited a massive legacy codebase for a client. Last time, the entire workflow—including heavy use of AI coding assistants—fit comfortably inside a $200 monthly subscription. This time, with usage-based pricing (like GitHub Copilot charging per token or similar shifts), the same work is projected to cost around $2,000. What changed? Not the code, but the economics behind the models.

This isn't an isolated anecdote. It's a window into one of the most important (and least discussed) dynamics in AI right now: the gap between what consumers and power users actually pay and what it truly costs to run inference at scale. Heavy users are getting enormous subsidies, and the sustainability of that model is questionable as companies eye IPOs and profitability.

SemiAnalysis recently bought every major subscription tier from Anthropic (Claude) and OpenAI (ChatGPT) and stress-tested them with long-horizon coding and agentic tasks until weekly limits were hit. These aren't theoretical numbers; they come from real, sustained usage that mimics professional developer workflows. Most casual users never come close to these limits, which is why the economics work on average (like a buffet where light eaters subsidize heavy ones). But for power users—exactly the people and companies driving real productivity gains—the effective discount is massive. Assuming high gross margins on API usage (around 75% as a benchmark in the analysis), subscription margins look far worse at high utilization. The labs are effectively giving away compute to retain users and build habits while the technology matures.

Inference—the process of running a trained model to generate outputs—is the dominant ongoing expense in AI, often 55–80% of total GPU spend in production environments. Overall, per-token costs for equivalent intelligence have collapsed dramatically (hundreds of times cheaper in some cases), but absolute spending by labs remains huge because usage is exploding. Companies like OpenAI have reported massive inference-related losses and are projecting continued heavy burn (e.g., billions annually) even as revenue grows.

Consumer subscriptions are heavily subsidized, especially for power users. This is a deliberate strategy to acquire users, gather data and feedback, and maintain mindshare. It's funded by enormous venture capital and strategic investments (Microsoft for OpenAI, Amazon and Google for Anthropic, etc.). Enterprise and API customers, by contrast, face much closer-to-cost or profitable pricing. Large commitments often come with volume discounts, reserved capacity, or dedicated infrastructure, but they pay closer to the true marginal cost of compute. This is where the real margins live for the labs.

The math explains the tension: if a lab has roughly 75% gross margins on API tokens, maxed-out subscriptions can flip to deeply negative margins. Average utilization across all subscribers keeps the overall business afloat for now. But both OpenAI and Anthropic are reportedly preparing for public market debuts. Investors eventually demand profits, not just growth and market share. We've already seen early signs: moves toward usage-based pricing in tools like Copilot, quota adjustments, and experiments with feature gating. The "free" or ultra-cheap intelligence era for heavy users may be peaking.

That said, the underlying trend of falling inference costs continues thanks to better chips, software optimizations, and scale. Labs can profitably serve increasingly powerful models at lower prices over time—just not necessarily at the current subsidy levels for unlimited heavy use.

This situation echoes past infrastructure booms. During the railroad expansion and the dot-com fiber optic buildout, companies overbuilt capacity, many went bankrupt, and investors lost fortunes. Yet society ended up with durable, transformative infrastructure that enabled decades of growth. In AI, we're in a similar phase of massive capital deployment into chips, data centers, and models. There will likely be consolidation, failures, and shakeouts among providers. But the "rails" (compute infrastructure, efficient models, and tooling) will remain—and improve.

Open-source models, self-hosting options, and specialized inference providers are already offering cheaper alternatives for many workloads. The pricing landscape will probably not look exactly as it does today for heavy professional use. Expect more tiering: generous but capped consumer plans, premium "unlimited" options at higher prices, and robust enterprise offerings. The best frontier models may become relatively more expensive or restricted for casual and heavy individual use, while overall intelligence gets cheaper and more accessible through efficiency gains and competition (including from open models and non-Western providers).

Power users and companies will adapt by optimizing workflows—caching, smaller models for simpler tasks, agent orchestration, self-hosting where it makes sense—or paying more for guaranteed access. The $200 "all-you-can-eat" golden era for intensive coding and agentic work is likely transitional.

We're living through an extraordinary period of subsidized intelligence that accelerates experimentation and adoption. It won't last in its current form, but the infrastructure being built will power productivity gains for years to come. The question isn't whether AI gets more expensive—it's how quickly costs fall relative to capabilities, and who captures the value. Priced right and used wisely, this technology still represents one of the biggest leverage opportunities in history. The subsidies bought us time to figure it out. Now the real economics are coming into focus.

Read the original