Chips & Hardware · Report

OpenAI unveiled Jalapeño, its first custom AI inference chip co-designed with Broadcom, targeting 50% cost reduction for LLM inference. Development took nine months and deployment is scheduled for end-2026 across gigawatt-scale data centers.

OpenAI's custom silicon vertically integrates inference infrastructure, challenging Nvidia's dominance and potentially accelerating the shift from GPU-centric to specialized inference architectures.

Trade pressSlicast · June 25, 2026 · US · Source: Google News

importance 90

OpenAI has announced its first custom chip, Jalapeño, which is designed for AI inference workloads and manufactured by Broadcom. The company's CEO, Sam Altman, and Broadcom's CEO, Hock Tan, showcased the chip's first wafer. This announcement reflects a broader trend in which AI companies are developing custom chips for the agentic AI era—Anthropic is exploring custom silicon while Google is advancing its established TPU strategy.

OpenAI describes Jalapeño as marking the beginning of its vision for the future of LLM inference. The chip represents the company's first AI accelerator in a multi-generational compute platform being built to make AI faster, more reliable, and more accessible. Designed from scratch in just nine months—from initial design through manufacturing tape-out—Jalapeño focuses exclusively on AI workloads rather than serving as a general-purpose accelerator adapted from earlier architectures.

The platform is supported by a robust ecosystem involving Broadcom and Celestica, which will support chip implementation, board and rack system integration, high-performance networking, and scalable production systems. Jalapeño is purpose-built for LLM workloads powering ChatGPT, Codex, the API, and future agentic products, while remaining flexible enough to work with all LLMs across the industry.

The chip combines the power and throughput of today's leading AI accelerators with latency approaching the fastest specialized inference systems, positioning it as well-suited for interactive LLM products at scale. Early engineering samples are already running ML workloads, including GPT-5.3-Codex-Spark, at production target frequency and power. The chip features eight HBM sites and visible compute dies in its center.

The first Jalapeño platforms are scheduled for deployment by the end of 2026, with expansion planned in subsequent years. This multi-generation initiative reflects OpenAI's strategy to diversify its compute portfolio. While the company maintains a partnership involving 10GW of NVIDIA systems, the custom silicon investment reduces reliance on a single chipmaker and provides flexibility as supply constraints persist across the industry.

Read the original