Data Centers · Report

China Mobile Hubei and Huawei completed China's first carrier-grade validation of an AI inference acceleration solution.

Signals production-ready carrier-grade AI inference infrastructure, enabling large-scale deployment and positioning Huawei as viable Western alternative.

Trade pressSlicast · June 24, 2026 · US · Source: Google News

importance 85

At MWC Shanghai 2026, China Mobile Communications Group Hubei Co., Ltd and Huawei announced the successful live-network validation of Huawei's AI Inference Acceleration Solution, marking the first such validation in China's carrier industry. Powered by Huawei's OceanStor A800 storage, Ascend A3 SuperPoD, and Unified Cache Manager (UCM), the solution delivers up to 372% improvement in token throughput for long-sequence artificial intelligence inference workloads, providing important technical support for carriers' efficient deployment of AI computing services.

As AI applications increasingly shift toward AI agents, long-sequence scenarios—including code generation and multi-turn dialogues—are becoming prevalent. However, conventional on-chip memory and dynamic random-access memory impose severe constraints on KV cache hit ratios, limiting overall performance. Huawei introduced UCM in 2025 to address this bottleneck. By leveraging external high-performance storage, UCM eliminates the conventional capacity limitations of on-chip memory and DRAM, enabling petabyte-scale KV cache capabilities. The solution implements full-lifecycle, hierarchical management and scheduling of KV cache, significantly expanding context windows for single-turn dialogues. For multi-turn interactions, UCM reuses historical KV cache to eliminate redundant computations, delivering optimized inference at lower costs.

The validation deployed the vLLM-Ascend framework in China Mobile Hubei's live network, testing long-sequence inputs ranging from 8K to 190K tokens across mainstream models including MiniMax M2.5 and GLM-5.1. Results demonstrate that as context length increases, the advantages of the AI Inference Acceleration Solution become more pronounced, effectively resolving the KV cache capacity bottleneck common in long-sequence inference.

A China Mobile Hubei representative stated: "Hubei is located in the core area with only 10 milliseconds of latency to the nation's eight major computing power hubs. This test validates the necessity of storage-compute-network collaboration. In scenarios such as AI agent interaction and code generation, the AI Inference Acceleration Solution can increase throughput by over 50%, laying a solid foundation for large-scale deployment of China Mobile Hubei's AI services."

Michael Qiu, President of the Huawei Global Data Storage Marketing & Solution Sales Department, remarked: "With major carriers launching token packages, the large-scale adoption of AI agents has clearly entered a new phase. Token consumption is expected to grow exponentially. The AI Inference Acceleration Solution not only significantly reduces TTFT, but also helps slash token costs, enabling carriers to build efficient and green AI computing infrastructure."

The successful validation represents a major step forward in collaborative optimization of AI computing infrastructure for carriers, providing a replicable technical model for the global AI industry. MWC Shanghai 2026 runs from June 24 to June 26 in Shanghai. Huawei will showcase its latest products and solutions in Hall N1 of the Shanghai New International Expo Center.

Read the original