Saturday, June 27, 2026
EN·DarkSubscribe
AI Infrastructure · News & Analysis
HomeChips & HardwareReport
Chips & Hardware · Report

Broadcom AI custom ASIC designed in nine months, challenging traditional chip design timelines.

Shortened design cycle proves feasibility of agile silicon development; reduces first-mover advantage of incumbents, enables faster competitive response.
Trade pressSlicast · June 27, 2026 · US · Source: Google News
importance 72

OpenAI and Broadcom unveiled Jalapeño, OpenAI's first Intelligence Processor: an accelerator architected around OpenAI's vision for LLM inference and the first chip in a multi-generation compute platform the two companies are building together. OpenAI designed the chip from scratch around its understanding of LLM fundamentals, while Broadcom and Celestica industrialize the platform through chip implementation, board and rack integration, high-performance networking, and scalable production.

Engineering samples are already running ML workloads in the lab at production target frequency and power, including GPT-5.3-Codex-Spark. OpenAI says Jalapeño was co-developed from initial design to manufacturing tape-out in just nine months—what it claims is the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors, accelerated in part by OpenAI's own models. Broadcom's silicon implementation and networking technologies, including Tomahawk networking silicon, help bring the platform to large-scale production, with gigawatt-scale deployment alongside Microsoft and other partners beginning at the end of 2026.

The headline claim is striking: initial design to manufacturing tape-out in nine months on a high-performance accelerator. Custom AI silicon normally runs on a multi-year cadence. Architecture, RTL, verification, timing closure, physical design, and packaging qualification each consume quarters, and a frontier-class accelerator from concept to tape-out in 18–24 months is considered fast. Nine months is not an incremental improvement; it is a different category for a first-generation chip.

OpenAI attributes the speed to three factors: deep software-hardware co-development with its engineering teams, Broadcom's silicon-implementation expertise, and the use of OpenAI's own models to accelerate parts of the design and optimization process. The recursive framing is significant: the same models served to users are helping build the infrastructure that will serve the next models.

However, a nine-month tape-out is not the same as a nine-month product. Silicon validation, yield ramp, system integration, and software maturation still stand between Jalapeño and gigawatt-scale production at the end of 2026. Claims of "performance per watt substantially better than current state-of-the-art" are based on early testing, with the detailed technical report still "coming months" away. The market should treat the timeline as genuinely impressive and the performance as unverified until that report lands.

Richard Ho, who leads OpenAI's hardware program, provided candid insight at the Synopsys Converge Executive Forum on how this was accomplished. Hardware agents are not the "write code, compile, test" loop people imagine from software copilots, he explained. "It is a lot less straightforward than just doing a straight-line code." Instead, his team uses agents to multiply the effectiveness of every human engineer, spawning "hundreds of them" to run verification, timing closure, and area optimizations overnight, with "sub-intelligence reading back… the logs, figuring out… the debug, and then tweaking parameters and tool flows in order to get the results." His summary: "you can do a better design, you can do it faster and you can do it more cheaply."

This is the realistic version of "AI designed the chip." It is not autonomous silicon generation; it is massively parallel, domain-specific automation that compresses the human-bottlenecked verification and optimization loops that normally dominate an ASIC schedule. Ho's ambition is to pull hardware timelines closer to software timelines. Jalapeño is the first public data point suggesting that gap is starting to close.

A blank-slate inference chip does not tape out in nine months without a manufacturing platform already waiting for it, and this is where Broadcom's role is underappreciated. Broadcom spent the past decade maturing the advanced packaging methodology that custom accelerators now depend on, much of it forged across its Google TPU programs, including the eighth-generation TPU work where compute, memory, and I/O integration were pushed to new limits. In December 2024, Broadcom delivered the industry's first 3.5D "Face-to-Face" (F2F) XDSiP platform—combining 3D silicon stacking with 2.5D CoWoS packaging, integrating more than 6,000 mm² of silicon and up to 12 HBM stacks in a single device. The F2F approach delivers roughly 7x more signal density between stacked dies and a 10x reduction in die-to-die interface power versus the older face-to-back method, with production shipments beginning February 2026. That is the packaging "assembly line" a chip like Jalapeño slots into.

Networking is the second enabler. Jalapeño's architecture is described as reducing data movement and balancing compute, memory, and networking to push realized utilization closer to theoretical peak—and Broadcom's Tomahawk silicon is the scale-out fabric that makes a gigawatt-scale inference cluster behave like one machine. The lesson from the TPU lineage is that at frontier scale, the accelerator die is necessary but not sufficient. Packaging and networking decide whether the silicon ever reaches its theoretical limits in a real data center.

Read the original
Broadcom AI custom ASIC designed in nine… · Slicast