Chips & Hardware · Report

FAR Labs opens access to cheaper AI inference platform, aiming to reduce per-token costs.

Proliferation of inference-focused alternatives pressures NVIDIA pricing; signals market demand for cost-optimized edge and latency-sensitive inference tiers.

Trade pressSlicast · June 27, 2026 · US · Source: Google News

importance 61

FAR Labs, an Abu Dhabi-based AI infrastructure business operating under the Dizzaract umbrella, has opened registration for developers to access its FAR AI inference platform. The move follows the company's disclosure of lower listed prices for selected model deployments, positioning cost reduction as a core value proposition for builders seeking to minimize expenses as AI adoption accelerates across applications, tools, and workflows.

At the heart of FAR Labs' offering is a distributed inference network that balances developer demand with available computing supply. Users access the system through a single OpenAI-compatible API, can select from multiple models, and benefit from rapid onboarding. Workloads are routed across GPU resources via FAR Orchestrator, the company's orchestration layer.

Cost differentiation forms the cornerstone of FAR Labs' positioning. The company published benchmark comparisons across several model deployments against competitor pricing. For Qwen3-30B-A3B, FAR AI listed pricing at $0.03 per 1 million tokens, compared with $0.35 for NextBit and $0.27 for DeepInfra—a claimed savings of up to 91 percent. For Qwen2.5-72B-Instruct, the company quoted FP8 pricing at $0.17 per 1 million tokens against $0.39 for NovitaAI BF16 and $0.38 for DeepInfra FP8, representing 55 to 56 percent lower costs. For Qwen3.5-122B-A10B, FAR AI listed FP8 pricing at $0.51 per 1 million tokens, with output token costs up to 79 percent lower than comparable offerings from AtlasCloud FP8 and SiliconFlow FP8.

The pitch arrives amid a shifting economic dynamic in AI infrastructure. While token unit prices have declined sharply, total inference spending continues rising as businesses deploy AI-generated requests at scale through customer support systems, agents, assistants, games, and internal workflows. This dynamic particularly pressures developers relying on proprietary APIs from providers like OpenAI and Anthropic, where recurring inference charges can erode margins and constrain resources for testing and growth.

FAR Labs attributes its pricing advantage to accessing underutilized computing resources rather than maintaining solely on large dedicated data centre fleets. The platform draws GPU capacity from consumer devices and small-to-medium enterprise data centres, then allocates work through its performance-focused orchestration layer. Beyond pricing, the company addresses production-grade requirements: the orchestration layer incorporates trusted execution environment-based secure inference, reliability scoring, support for both open and proprietary models, and Semantic Vector Streaming. Its routing system is engineered for uptime, workload continuity, and latency-sensitive performance.

Interest in this infrastructure category emerged consistently during developer, GPU supplier, model team, investor, and enterprise conversations around SuperAI Singapore, according to FAR Labs. Those discussions signaled demand for inference infrastructure that is faster, more reliable, lower-cost, and production-ready.

FAR AI currently operates in a closed testing phase with select partners. FAR Labs is accepting early access registrations from builders, offering 1 million free tokens as a program incentive.

Ilman Shazhaev, Founder and Chief Executive Officer of Dizzaract, framed the market opportunity: "The price of AI keeps falling. Cost per token is down about 99% since 2021. And yet your AI bill keeps rising, because usage explodes faster than prices drop. Inference is becoming the single largest cost in AI. Our cost advantage is not a discount we are burning cash on. It is structural."

Read the original