Compute & Cloud · Report

Together AI's token volume surged to 400 trillion amid accelerating demand for cheaper alternative AI compute versus hyperscaler cloud pricing.

Neocloud providers gaining share from hyperscalers as cost pressure mounts; validates market shift toward distributed, cost-optimized AI infrastructure alternatives.

Trade pressSlicast · June 24, 2026 · US · Source: Google News

importance 85

Together AI is now processing over 400 trillion inference tokens per month—a roughly 13,000x increase from 30 billion a year ago. The cloud inference platform has quietly become one of the fastest-scaling AI infrastructure companies in the world, propelled by a straightforward thesis: enterprises want powerful AI models without the eye-watering costs of proprietary APIs.

Founder Vipul Ved Prakash has described the growth trajectory in terms that would make most SaaS founders weep with envy. Daily token processing climbed from approximately 1 billion to over 1 trillion, a leap of more than 1,000x. The company reportedly reached an estimated annualized revenue of approximately $1 billion by early 2026—not a valuation figure or a fundraising milestone, but actual revenue. That's the kind of metric that separates companies with genuine traction from those running on venture capital.

The economics are straightforward. Running inference on proprietary frontier models from OpenAI or Anthropic comes with per-token pricing that escalates quickly at enterprise scale. Open-source models, by contrast, allow organizations to run comparable workloads at a fraction of the cost, with the added benefits of customization and fine-tuning. Together AI has positioned itself at the center of this shift, making the deployment and scaling of open-source models as frictionless as possible. The platform appeals to cost-conscious startups and large enterprises alike, the latter increasingly wary of vendor lock-in.

On June 3, 2026, Together AI became the first commercial customer for Vector Core Compute's new inference cloud, a platform built on a hybrid CPU/GPU/RDU architecture designed specifically for high-throughput AI workloads. The partnership reflects a broader market reality: inference, not training, is becoming the dominant compute workload. Training a large language model is a one-time or periodic expense. Inference—the actual serving of that model to users—runs continuously and scales with adoption.

The 400 trillion token milestone signals that the AI inference market is entering a new phase, one where scale and cost efficiency matter more than model novelty. Open-source adoption threatens the pricing power of proprietary model providers, as enterprises can now run equivalent models on platforms like Together AI at significantly lower cost. The risk, however, is concentration. If a handful of open-source models dominate—Meta's Llama family being the obvious example—the inference layer could become commoditized quickly. The Vector Core Compute partnership suggests Together AI is already hedging against this, locking in next-generation hardware advantages before the market grows crowded.

Read the original