IBM's New 3D Chip Architecture Breaks a Decade-Long SRAM Scaling Stall
IBM has unveiled a breakthrough three-dimensional transistor architecture called nanostack that solves a critical bottleneck in AI chip design: the inability to shrink memory cells alongside computing power. The technology, built at IBM's Albany research facility, stacks and staggers transistors vertically for the first time, delivering 40% SRAM (static random-access memory) cell-height reduction compared to current state-of-the-art designs. This represents the first meaningful SRAM scaling gain in over a decade, a constraint that has increasingly hampered AI accelerator performance as logic components have continued to shrink.
Why Has SRAM Scaling Been Stuck for So Long?
For more than 60 years, transistor scaling has happened in two dimensions: the horizontal X and Y axes of a computer chip. Every complementary metal-oxide-semiconductor (CMOS) gate pairs an N-type transistor with a P-type transistor, and these have always sat side by side on the same plane, separated by a minimum gap called N-P spacing. This gap isn't determined by how small the transistor can be made; instead, it's set by the need to pattern two different gate metals next to each other and prevent electrical interference. In SRAM bitcells, which are laid out as an N-P-N-P sequence, this spacing dominates the cell area, which is why SRAM has barely shrunk for a decade while logic components around it have continued to advance.
The consequence is significant. At TSMC's N3 process node, the six-transistor SRAM bitcell came in around 0.0199 square microns, barely 5% smaller than the previous N5 node, and showed no scaling at all in the N3E variant. As logic kept shrinking around it, SRAM's share of advanced-chip die area has climbed toward 30% and beyond, creating a cost and efficiency problem precisely when AI silicon needs the most memory.
How Does Nanostack Solve This Problem?
IBM's nanostack architecture introduces a genuine third axis by stacking the N device directly beneath the P device, with only a thin bonding dielectric between them. This vertical arrangement effectively eliminates the lateral N-P spacing constraint that has blocked SRAM scaling. Each nanostack cell comprises two nanosheet transistors built on separate wafers and joined by ultra-thin dielectric bonding, with three sheets per device, each sheet roughly 5 nanometers thick and separated by a 9-nanometer suspension. Because the two transistors are bonded rather than patterned together lithographically, the top and bottom devices can be optimized independently with different channel materials, dielectrics, and metals on each layer.
IBM's research team demonstrated this enabling structure in silicon for the first time, achieving more than 40% SRAM cell-height reduction versus state-of-the-art non-stacked cells, accomplished entirely within today's patterning capability. The architecture also delivers about 20% lower per-cell wordline capacitance and substantial wordline resistance reduction, with backside bitlines lowering bitline resistance for faster, lower-energy SRAM access.
What Does This Mean for AI Accelerators?
Dense-SRAM accelerators run into the SRAM constraint first. Weights, activations, and key-value caches in inference are kept in SRAM to stay close to compute and avoid the memory wall. Architectures from companies like Cerebras, Groq, and SambaNova commit hundreds of megabytes of SRAM on-die, and even mainstream GPUs devote tens of megabytes to L2 cache. A 40% reduction in SRAM cell height translates to roughly 40% more SRAM in the same area, or the same capacity at lower cost, which lets designers grow key-value cache capacity without growing the die.
The performance implications are equally important. Because nanostack optimizes its top and bottom transistors independently, the SRAM devices can be tuned for read and write margin separately from logic, supporting the low-voltage operation these chips rely on for efficiency. This restarts SRAM scaling exactly when accelerators are most SRAM-bound, addressing a critical bottleneck in modern AI inference.
How to Understand Nanostack's Place in the Chip Industry Roadmap
- Sequential Bonding Approach: Nanostack sequentially bonds two separately optimized nanosheet wafers and staggers them, rather than building a monolithic CFET (complementary field-effect transistor), allowing the top and bottom transistors to use different materials and reach production faster.
- Multi-Year Roadmap: IBM projects a roadmap from 7 angstroms through 5A, 3A, 2A and onward to 1A across roughly a decade of scaling, with multi-layer stacking signposted beyond that, establishing nanostack as a platform rather than a one-off device.
- Production Timeline: The work points to a path to production within about five years, with collaboration on high-NA EUV (extreme ultraviolet) lithography with research and fabrication partners to enable manufacturing at scale.
Nanostack builds on techniques the industry has already proven at larger scales. Three-dimensional packaging already stacks whole dies using hybrid bonding at the package level, and backside power delivery has put routing on both sides of the wafer for the first time. Intel's PowerVia is shipping on its 18A node, and TSMC's Super Power Rail is slated for production around 2026. Gate-all-around nanosheet, IBM's own earlier invention, was commercialized by TSMC's N2 and Intel's 18A RibbonFET. The agreed next step across imec, TSMC, and Intel is CFET, which stacks N over P monolithically under a shared gate, but the industry does not expect CFET in production until roughly 2031. Nanostack takes the hybrid-bonding and backside-power techniques those companies proved and pushes them down into the transistor pair itself, potentially reaching production faster than the monolithic CFET approach.
The broader significance is that nanostack resets the roadmap for AI-era compute at a moment when the industry has hit a wall in traditional two-dimensional scaling. By unlocking SRAM scaling for the first time in over a decade, IBM has addressed a constraint that has become increasingly expensive for chip designers building memory-intensive AI accelerators. The architecture is not a standalone leap but rather the next rung on a ladder the whole industry has been climbing, one that arrives at a critical inflection point for AI hardware development.