Capital Markets · Report

SemiAnalysis and Sequoia Capital frame AI infrastructure buildout as full-stack co-design opportunity with 100X value creation potential.

Narrative shift toward distributed model training and inference architectures; validates custom silicon and neocloud infrastructure as strategic.

Trade pressSlicast · July 3, 2026 · Global · Source: NextBigFuture

importance 64

Sean from Sequoia introduces Dylan Patel of SemiAnalysis, praising the firm as the leading independent research organization in semiconductors. SemiAnalysis covers technical details, supply chains, and the broader industry context at a time when semiconductors had largely lost mainstream appeal in the West. Recent rumors suggest the firm has passed $100 million in revenue with possible venture fund ambitions, a testament to the trusted brand Dylan has built.

Dylan grew up in a family-run motel and gas station business, where he learned to understand the technical details, economics, and margins that would later define his expertise. At age eight, he received an Xbox 360 as a Christmas gift shortly after the console's announcement on his birthday. When the device suffered the red ring of death hardware failure, he opened it up and successfully repaired it by shorting the temperature sensor after other methods failed. This pivotal moment "opened Pandora's box" and sparked his lifelong fascination with hardware tinkering. By age twelve, he was active in hardware communities and forums, laying the foundation for his later career.

Dylan earned degrees unrelated to semiconductors and worked for two years as a quant at a small risk firm before a series of personal setbacks in early 2020 redirected his path. Workplace issues, his grandmother's death from dementia, and COVID lockdowns prompted him to move in with his brother in Nashville. During this period, he posted more frequently online, traded stocks profitably around COVID and semiconductor shortages, and was eventually doxed. On his twenty-fourth birthday, he launched SemiAnalysis with two detailed public blog posts under his real name, which quickly gained significant traction and led to consulting work.

After this tumultuous period, Dylan spent six months living out of a truck and tent while visiting national parks across America. He negotiated cheap motel rooms during weekdays and spent weekends reading textbooks on semiconductors and AI. He continued publishing detailed blogs throughout his travels and later spent time in Latin America. He began attending over forty conferences per year worldwide, from major AI events like NeurIPS to highly technical shows such as SPIE lithography conferences. Through direct conversations with experts at these events, he learned arcane supply-chain details rarely published elsewhere, dramatically deepening his expertise while remaining effectively homeless from mid-2020 onward.

InferenceX represents Dylan's response to a fundamental problem: traditional point-in-time benchmarks become outdated quickly due to rapid model releases and constant software optimizations. The platform runs automated daily benchmarks across the latest models on donated hardware worth over $50 million from providers including Nvidia, AMD, Google, and Amazon. It focuses on the critical throughput-versus-interactivity curve, publicly sharing optimal configurations so anyone can achieve near-peak performance. The project tracks cost and power efficiency, revealing annual gains of roughly 40–60× in intelligence per watt and per dollar. Dylan believes inference will become one of the largest markets on Earth, eventually exceeding oil in economic impact.

Hardware-model co-design emerges as the central theme of frontier AI development. OpenAI's models tend toward sparsity while Anthropic's are relatively more dense, creating fundamentally different optimization requirements. These architectural differences affect how models map onto specific hardware such as GPUs versus TPUs, influencing matrix-multiply shapes, attention mechanisms, and expert routing. Hardware interconnect and network topology further reinforce these divergences, making co-design between model architecture and underlying hardware essential for peak performance.

Nvidia's NVLink connects up to seventy-two GPUs through dedicated switches, while Google's ICI allows up to eight thousand chips to communicate at high bandwidth without switches by routing through other chips. These contrasting interconnect designs create different latency, bandwidth, and scaling characteristics. The physical shape and connectivity of hardware directly influence which model architectures perform best on each platform. Model companies optimize their architectures for the specific interconnect they primarily use, creating strong path dependence for each ecosystem.

The traditional CUDA moat is partially eroding because frontier model labs now use AI coding tools to write custom kernels for alternative chips. With only a small number of major model developers, the need for broad programmability across thousands of customers has diminished. However, downstream ecosystem effects persist: models heavily co-optimized for Nvidia hardware run sub-optimally on other platforms. Big labs frequently fork open-source frameworks or build their own stacks, reducing reliance on standard CUDA tooling. The moat is shifting from raw programmability toward full-stack co-design advantages.

Chinese labs have produced models explicitly co-designed for Nvidia GPUs, making them less efficient on TPUs and other architectures. Major Western labs similarly co-optimize across model architecture, infrastructure software, and target hardware to achieve multiplicative gains. When optimization spans all layers simultaneously, improvements can reach one hundred times rather than simple additive or multiplicative gains from individual layers. Smaller teams still rely heavily on open-source tools like vLLM and SGLang, while frontier labs have the resources to customize everything. Full-stack co-design is becoming the primary source of competitive advantage.

Cerebras excels at very fast inference, which SemiAnalysis itself uses extensively for high-value tasks where speed justifies premium pricing. The company's SRAM-based architecture faces challenges scaling to extremely large models with long context lengths. Most revenue and usage at leading labs still comes from their best—largest—models, where fast inference modes deliver clear ROI. Dylan emphasizes rigorous tracking of token spend and ROI on every task to decide when fast mode is worth the cost. Cerebras occupies a valuable but specialized niche rather than replacing general-purpose GPU or TPU clusters for all workloads.

Dylan becomes particularly frustrated by claims that AI has no ROI or that model progress has plateaued. He points out that capabilities have consistently moved up and to the right, with old benchmarks saturating and new, harder benchmarks showing rapid gains. Semiconductors involve thousands of complex layers, and even experts learn new details daily about chemicals, processes, and supply chains. People often possess accurate facts yet reach completely incorrect conclusions due to missing context across abstraction layers. He views ongoing model improvement and economic value creation as undeniable based on both data and direct observation.

Looking ahead, Dylan is highly excited about space-based data centers and SpaceX-enabled opportunities over the next decade. He expects co-packaged optics to arrive toward the end of the decade, with debate mainly around exact timing. Specialized chips will carve out profitable niches even as Nvidia and a few hyperscaler ASICs dominate the majority of the market. The compute crunch persists because demand for useful AI tasks is expanding faster than new gigawatts of capacity are coming online, despite quarterly increases in deployed power. High gross margins at frontier labs allow them to pay substantial premiums for additional compute without destroying profitability, fundamentally reshaping how the industry thinks about infrastructure investment and AI value creation.

Read the original