IBM’s eDRAM on 32nm SOI
We have already described IBM’s 32nm high performance SOI process, which was first disclosed at VLSI 2009. While there were no substantial updates, session 27.5 at IEDM 2010 included a paper that discusses some details of IBM’s eDRAM on their 32nm process. IBM is currently shipping the 45nm POWER7 microprocessor, which implements a 32MB L3 cache as eDRAM. The upcoming z196 mainframe processor similarly integrates 24MB of eDRAM for a L3 cache.
Embedded DRAM is an alternative approach to storage arrays, proposed as a replacement for extremely large SRAM arrays. Rather than using 6 or 8 transistors to store each bit, eDRAM cells rely on a capacitor and a single access transistor. It is denser than SRAM, substantially more resilient to soft errors (SER) due to the capacitor and smaller collector area. The overall arrays are roughly 2-4X denser than SRAM (cell size is roughly 4-6X smaller), with 2-3 orders of magnitude improvement in SER. Additionally, there is a slight decrease in active power and a substantial drop in standby power.
However, the capacitor must refresh periodically to retain the data and the access time is substantially slower than SRAM. The high access times are one reason why eDRAM is suitable for large arrays (e.g. a last level cache), since it cannot scale down to the sub-nanosecond times required for high performance SRAMs. The last drawback is perhaps the most problematic – IBM’s eDRAM requires several changes and addition manufacturing steps, including the formation of a deep trench for the capacitor (shown in Figure 3), which impacts cost/yield. While this is hard to quantitatively estimate, the costs are clearly enough of an issue that AMD has forgone using eDRAM. However, IBM’s economics are very different and they can clearly justify the extra silicon cost to reduce overall system cost.
Figure 3 – Deep Trench Capacitor on IBM’s 45nm SOI Process
IBM’s 32nm eDRAM presentation discussed the opportunities of scaling eDRAM to 32nm and migrating from a process with a conventional poly-silicon gate stack to high-k/metal gates. The overall focus was on the eDRAM cells, rather than the entire array. The eDRAM cells shrank from 0.0672um2 in 45nm down to 0.0394um2. The 1Mbit eDRAM macros used in the POWER7 are 0.24mm2 with a 1.05V supply and 1.7ns/1.35 cycle and access times. The overall density for the 32nm eDRAM arrays was not disclosed but should be >11Mbit/mm2 density, based on a previous paper at VLSI Symposium. The reported cycle and access times for the 32nm array were measured at 2ns and 1.5ns at 1V. These latency values are slightly slower than the 45nm arrays (1.35ns and 1.5ns) used in the POWER7, and will certainly be refined by IBM’s engineers over time.
The trench capacitor substantially benefits from the new materials in IBM’s 32nm process. It has a >35:1 aspect ratio, tuned for performance and retention. First an interface layer is grown on the trench, followed by a high-k dielectric (HfO2) and metal film (TiN) and finally capped with arsenic doped poly-silicon. The new high-k materials improve the capacitance by 25% at equal leakage (the 45nm capacitors are rated for 18fF). The combination of high-k and metal film reduces the parasitic resistance of the trench capacitor by 2-3X which improves the ability to charge. More charge in the capacitor in turn increases retention time and the sense margin for reads.
The access transistor actually saw more improvement from the HKMG process than the capacitor itself. The access transistors are thick gate oxide devices that read and write from the array word-line and it is critical that they have high performance to clearly read out data. More importantly, the leakage must be incredibly low for good retention times, because when enough of the charge in the capacitor has leaked away, the data will be lost. The access transistors are incredibly sensitive to Vt (which controls leakage) and are significantly impacted by random dopant fluctuations (RDF). According to IBM, the variation across an entire chip can easily be 4.5 standard deviations. The new HKMG gate stack scales the access transistor channel thickness from 3.1nm to 2.3nm, reducing the standard deviation for Vt by 35% to 40mV for an 8Kb array macro. IBM achieved leakage below 3fA at 25C, and will guarantee a 40us retention time at a more realistic 85C junction temperature. The access transistor’s drive current increased by ~50%; which is large, but consistent with gains from switching to a high-k/metal gate stack.
While IBM did not present any information on actual array density, it is possible to make some inferences. Comparing IBM’s eDRAM in the POWER7 to comparable SRAMs from Intel yields a roughly 2X density advantage at the same node. Equivalently, IBM’s 45nm eDRAM slightly exceeds the density of Intel’s 32nm SRAM. Based on the results demonstrated and IBM comments, the overall array area should scale by 60% at 32nm. This suggests that IBM can expect roughly a 2X advantage for their storage arrays and possibly some further upside with innovations in the overall array architecture.