OpenAI and Broadcom have unveiled Jalapeño, a custom inference processor optimized for gigawatt-scale AI data center deployments, designed to handle large language model inference workloads at production scale.
OpenAI and Broadcom have announced Jalapeño, a custom-designed inference processor built for large-scale AI data center deployments. The chip targets the production inference workloads that power services like ChatGPT.
Jalapeño is optimized for gigawatt-scale AI data center environments, focusing specifically on the inference phase where trained models serve user requests. By building a custom chip tailored to this workload—rather than relying solely on general-purpose GPUs—OpenAI addresses a key efficiency bottleneck in running inference at production scale.
The move signals OpenAI's strategy to verticalize its infrastructure stack, following competitors like Meta and Google in developing proprietary silicon. For the broader AI buildout, custom inference chips reduce operational costs and latency, enabling more efficient serving of frontier models as demand continues to scale.