Chips & Hardware · Report

AMD introduces Flash-extended memory architecture to scale server DRAM capacity without HBM.

Memory scaling innovation offsets HBM supply constraints at lower cost; enables alternative memory path.

Trade pressSlicast · June 30, 2026 · Global · Source: The Next Platform

importance 55

There is a crisis building in the datacenter, and it is centered around the scarcity of and ridiculously high prices for DRAM main memory. The GenAI boom has caused the hyperscalers, cloud builders, AI model builders, and neoclouds to hog what has become the limited capacity coming out of the memory foundries of Micron Technology, Samsung, and SK Hynix, and the demand shock is causing unprecedented price hikes.

According to Counterpoint Research, 64 GB DIMM memory prices rose by a factor of 3.5X between Q3 2025 and Q1 2026, and they will likely be up by a factor of 5X by Q3 2026. Micron's most recent financials show no end in sight to the demand shock out to 2028, which means prices will continue climbing.

DRAM main memory has shifted from representing around 50 percent of a server's cost in 2023, with CPUs comprising about a quarter of the bill of materials and other peripherals and storage making up the remainder, to DRAM now being between 60 and 90 percent of server costs by mid-2026, averaging around 75 percent. CPUs did not get cheaper, and because memory prices are skyrocketing, the increasing CPU prices look small by comparison.

Main memory can be used more efficiently and can even be extended with flash—a concept that has been pursued for more than a decade. This was the promise of 3D XPoint storage, launched by Intel and Micron in July 2015. 3D XPoint was intended to offer performance somewhere between DRAM and flash while remaining byte addressable like main memory but cost-effective like flash. However, Intel failed to ramp 3D XPoint fast enough in manufacturing and technology, and it ended up being almost as costly as DRAM while only a few multiples faster than flash. Intel's decision to restrict 3D XPoint to its own Xeon server processors prevented it from going mainstream. Ultimately, Intel killed the technology and sold its flash business to SK Hynix—a self-defeating move that cost Intel enormous future profits.

Today, the three major memory makers are allocating more DRAM production to expensive HBM stacked memory, cutting back on normal DRAM chips used in server DIMM modules and reducing flash output to boost DRAM production. Annual DRAM and flash production will increase by perhaps 20 to 25 percent, yet demand far exceeds that. As a result, DRAM and flash prices continue rising as the Big Three sell capacity to the highest bidders.

This market dynamic prompted AMD to acquire MEXT, a startup founded in 2023 that emerged from stealth mode in early April 2026. The company's name is meant to invoke the idea of memory extension, and its team has developed a way of transparently and invisibly extending DRAM main memory to flash storage.

Gary Smerdon, co-founder and chief executive officer of MEXT, brings deep expertise to this challenge. He was chief strategy and product officer at Fusion-io, the first major commercializer of flash storage for servers, which counted Apple and Meta Platforms as anchor customers. Before that, Smerdon spent six years leading solid state memory efforts at LSI Logic. Most recently, he co-founded TidalScale, which created a HyperKernel hypervisor enabling companies to build large-scale virtual NUMA servers from smaller physical NUMA servers. TidalScale raised $70.3 million and was acquired by Hewlett Packard Enterprise in December 2022.

MEXT had raised $2.4 million in seed funding, though sources suggest the total may have been higher. Investors included Clear, DN Capital, Uncorrelated, Raptor, and FJ Labs. The company has 39 employees, many from TidalScale, along with external expertise in memory management and virtualization in senior roles.

Co-founder David Reed served as chief scientist at both TidalScale and Lotus Development decades ago. More recently, he was a Fellow at HPE and vice president at SAP, while maintaining a long-term position as a professor of computer science at MIT, where he helped shape parts of the Internet stack. Carl Waldspurger, the principal engineer at VMware responsible for processor scheduling, memory management, and NUMA scheduling for the ESX hypervisor, as well as architect of VMware's Distributed Resource Scheduler, joined MEXT as chief scientist. The Distributed Resource Scheduler controls live migration of virtual machines, fundamentally about moving memory state between systems.

These leaders have long experience with memory challenges. In the late 1980s, they dealt with the 640 KB main memory barrier in the Intel 80286 architecture, using HIMEM.SYS extended memory drivers in DOS to access memory up to 1 MB. The 80386 introduced built-in extended memory, supporting up to 4 MB with 32-bit processing.

Extending memory is not a new concept, but it is timely given DRAM's current expense. Flash remains 50X less expensive than DRAM, with 30X lower power consumption, though it is 500X slower. Effectively using flash as a memory extender requires sophisticated engineering.

"We came up with three problems that, if we could solve them, would change everything," Smerdon explains. "One, we have got to increase DRAM utilization—that is obvious, and everyone is trying it through CXL and pooling. Second, we need no hardware or software changes for memory extension to work. Throughout my career at Ethernet, AMD, and LSI, all fast-growing products required no software changes. In an ideal world, neither should. Yet nobody focused on the memory problem has made this a core principle. Third, we need to bring flash into the memory tier. It is 50X cheaper per bit when we started in 2023, maybe 100X times now, with 30X times lower power per bit. There is just one problem: Flash is 500X times slower, and that does not perform well. We all know that swap sucks, so we had to crack these problems."

The solution is to stop placing cold data on DRAM and fill it with hot data—data needed within tens of nanoseconds. Pushing pages from hot to warm to cold onto flash is straightforward. The real challenge is that data can transition from cold to red hot with a single CPU instruction in a fraction of a nanosecond.

MEXT developed what it calls Predictive Memory, using AI algorithms to monitor applications and memory access patterns, retrieving data from flash back into DRAM before applications or the operating system request it.

"We have developed sophisticated machine learning models with much better prediction accuracy and coverage than previous approaches," explains Waldspurger. "We were inspired by modern AI techniques based on neural networks like LSTMs and LLM transformers, which excel at sequence prediction. Instead of predicting tokens in natural language, we apply similar ideas to predict sequences of future memory page accesses. Since our AI models run asynchronously, they benefit from richer information and context about longer-term trends and can leverage hardware counters."

Read the original