DSpark (DeepSeek) optimization eases AI chip bottlenecks and cost, claiming material efficiency gains.
Chinese artificial intelligence start-up DeepSeek has released a major upgrade to its flagship V4 model designed to significantly accelerate AI response generation. The upgrade reflects growing competition among Chinese developers to reduce serving costs and enhance user experience, amid broader efforts to overcome US AI restrictions.
The company has adopted DSpark, a speculative decoding framework that increases per-user response speeds by up to 85 per cent. This efficiency gain could substantially reduce AI systems' reliance on larger, more powerful chip infrastructure.
Conventional token-by-token AI output often slows considerably when generating lengthy responses, resulting in poor GPU utilization and prolonged user-perceived waiting times. According to research published on Saturday, DeepSeek identified this as a "primary bottleneck in serving AI."
The DSpark framework accelerates AI response generation—also known as inference, the process of serving a trained model to respond to user queries—by employing a lightweight draft model to propose candidate responses, which are then verified in batches by a larger model.
The framework incorporates two additional refinements. First, a semi-autoregressive generation method enables the model to produce small chunks of tokens rather than strictly one at a time. Second, a confidence-based scheduling system dynamically adjusts verification intensity based on computing demand, helping to balance speed and output quality.