OpenAI and Broadcom announced Jalapeño, a custom inference ASIC built in a nine-month development cycle, designed for large-scale LLM inference workloads with optimized performance-per-watt efficiency.
OpenAI and Broadcom have introduced Jalapeño, a custom-built inference processor designed specifically for modern large language models and future agentic AI workloads. The companies claim the processor delivers performance per watt higher than today's leading-edge hardware. OpenAI considers its hardware project a strategic one and envisions Jalapeño as the first generation of its inference hardware.
OpenAI stresses that Jalapeño is a purpose-built inference ASIC and not a repurposed training accelerator or a general-purpose AI processor. The architecture was designed based on OpenAI's understanding of LLM behavior and is meant to address practical bottlenecks for inference at scale, including costly data movement, balance between compute and memory resources, networking efficiency, and overall behavior. The processor design aims to wed high throughput with low latency—which is why it uses a huge compute chiplet and HBM memory rather than cheaper types of DRAM like many other inference accelerators—a characteristic particularly valuable for reasoning and agentic workloads.
OpenAI and Broadcom claim the processor delivers higher effective utilization than conventional AI accelerators and performance close to the theoretical maximum, suggesting very high efficiency in both costs and power consumption. However, the companies did not disclose performance targets for the Jalapeño ASIC, so these claims should be taken with appropriate caution.
Engineering samples are already operating in the lab at target clock speed and power, with OpenAI running machine learning workloads such as GPT-5.3-Codex-Spark. Early internal testing indicates that Jalapeño's performance-per-watt is substantially better than "current state-of-the-art hardware," though no hard numbers, benchmarks, memory configuration, or other details have been disclosed. How competitive it will be against AMD's Instinct MI400-series and Nvidia's Rubin-based offerings remains to be seen.
"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI's hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware's theoretical limits."
While Broadcom and OpenAI did not disclose full specifications, they showed the wafer and packaging, enabling brief analysis. The package appears to contain one large compute chiplet surrounded by six HBM modules, another chiplet likely containing input/output interfaces, and two structural dummy dies. The wafer image resembles a Broadcom-style systolic-array-heavy accelerator, with a very regular, repeated columnar floorplan showing replicated compute regions and fixed infrastructure macros, though this is speculative based on the image quality.
Analyzing the approximate die size of Jalapeño's compute chiplet based on the HBM3/4 package dimensions (10.975 mm × 10.975 mm) that surround it, the chiplet measures approximately 25.46 mm in width by 33 mm in height, yielding a die size of around 840 square millimeters—very close to the reticle size of EUV lithography systems at 858 square millimeters. While image quality limits precision, this estimate is likely close. The die size indicates substantial compute density; Jalapeño's compute die is considerably larger than compute dies of other inference accelerators on the market and more closely resembles processors for AI training.
The companies say the chip reached tape-out in just nine months and is slated for deployment beginning in late 2026, representing an extremely fast turnaround time in ASIC design. While it remains unclear whether Broadcom and OpenAI extensively used artificial intelligence to define and develop Jalapeño, the companies acknowledged using OpenAI's models to speed up parts of the chip's design and optimization work. Typically, ASIC design takes 1.5 to 2 years from scratch, suggesting AI can meaningfully shrink the development cycle. Broadcom's extensive reuse of logic across different custom designs also accelerates delivery.
Jalapeño is designed to support not only OpenAI's own workloads but also present and future LLMs across the industry, potentially allowing OpenAI to sell its hardware to third parties, provided it can secure sufficient supply from Broadcom and TSMC. Broadcom's chief executive indicated that Jalapeño will be deployed at gigawatt-scale data centers with Microsoft and other partners starting this year, though it remains unclear whether the processor will be used exclusively for OpenAI workloads or will be available for other tenants.
"Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI," said Hock Tan, President and CEO of Broadcom. "This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt-scale data centers with Microsoft and other partners beginning in 2026."