Where the Chips Fall
While other companies appear to be investigating new lithography techniques, principally immersion lithography, Intel opted to continue with 193nm dry lithography and focus on new materials for their 45nm process. Judging by the claimed results, this appears to have paid off. Intel claims a ~2X improvement in transistor density, which is to be expected. However, in their 45nm test chip, Intel demonstrated a 0.346um2 SRAM cell size, compared to 0.57um2 for 65nm and 1.0um2 at the 90nm node. An ideal linear shrink would reduce the cell size by 47.9%; however, the demonstrated 45nm SRAM cells are only 60.7% the size of their 65nm counterparts. The 90nm to 65nm shrink also wasn’t quite linear and was off by about 10%, while the latest is off by more than 20%. While two points are not a trend, this could signal that SRAM scaling will slow down in the future.
On the performance and power side, Intel’s results are excellent. The low end of the operating voltage range will be between 0.7-0.8V, up to 1.1-1.2V at the high end. They claim that the new process can either improve transistor switching speeds by slightly more than 20%, or reduce subthreshold leakage by more than 5X, or a combination of the two. Again, this reflects the fact that there is no longer a ‘free lunch’. They also saw an across the board ~30% decrease in transistor switching power, and more than a 10X reduction in gate leakage. Intel claims that these improvements would be impossible without the new materials, which is likely true. More importantly, Intel (and other companies in the future) will be able to extend process scaling by a couple generations.
Two fabs will begin manufacturing 45nm based products in 2H07; D1D in Oregon, Fab 32 in Arizona, while the new Fab 28 in Israel will ramp up in 1H08. In fact, Intel brought demos of 5 products, running at 1.8-2.1GHz to show off. Most of these systems were running real applications on Windows Vista, ranging from Office to rendering to video games, and all were using first silicon. That some of the first wafers out of the fab were running at 2GHz is quite a testament to the design and manufacturing teams involved.
To give an example of the benefits that the 45nm process could bring to a microprocessor, we will examine a hypothetical scenario. Tulsa, a 65nm implementation of the Pentium 4 microarchitecture, has plenty of available data from ISSCC and Hot Chips. Tulsa dissipates 150W at 3.4GHz, with 16MB of shared L3 cache. Of that 150W TDP, roughly 45W is from leakage power, while the remainder is active power dissipation . Under certain reasonable assumptions regarding leakage, it is possible to estimate the TDP for a hypothetical 45nm shrink of Tulsa. A best estimate for the TDP of a 45nm shrink of Tulsa is ~75W for the cores, I/O and control logic and ~10W for the L3 cache. That’s a 45% drop in power consumption at the same frequency. If Woodcrest has a similar leakage and active power profile, then the TDP on a 3GHz part fabbed at 45nm could be in the range of 40-47W. Altogether, these back of the envelop calculations lend a bit of credibility to some of the early rumors regarding extremely aggressive frequency targets for 45nm products.
Figure 4 – Penryn Die Micrograph
Penryn, the first 45nm processor (shown above), is largely a Core2 Duo, with some minor enhancements that boost the transistor count to 410M, from 293M for Woodcrest. The most obvious use of transistors will be an extra megabyte of L2 cache, bringing it up to 3MB for desktop parts, while high-end desktop and server designs will use 6MB. In a quad core configuration, using dual chip packaging, this would translate to 6 or 12MB of L2 cache. Penryn will also sport ~50 new SSE4 instructions that are targeted at HPC and media applications. Of these new instructions, roughly 80-90% will be executed in hardware. The remainder, mainly the CRC and string instructions, will be microcoded.
So far, Intel’s key competitors in the world of device fabrication, the IBM/Toshiba/Sony/AMD partnership appear to have pushed back high-k dielectrics and metal gates to the 32nm process. Unfortunately for AMD, the 32nm process is not expected to enter production until mid to late 2010. Traditionally, Intel MPUs outperform AMD half of the time, when they are on a more advanced process node, while AMD tends to lead during the time when both are on the same process. This produces a see-sawing equilibrium where AMD has a one year window to make inroads, and Intel a one year inroad to retake market share. The next expected transition will be AMD’s introduction of the Barcelona core, in the middle of 2007.
The high-k dielectrics and metal gates will give Intel an advantage on their 45nm process. However, this transistor level advantage will not directly translate to microprocessor performance, without corresponding advances or clever engineering to address wire delay. It will be up to Intel’s MPU designers and marketers to make the most of these benefits, by increasing clock speed or reducing power. The real question is whether the combination of high-k dielectrics and metal gates will shut the window of opportunity for AMD, when they introduce their own 45nm process in mid to late 2008, and only time will tell where the chips will fall.
 Cherkaoui, Karim, et al. High dielectric constant (high-k) materials for future CMOS processes. Tyndall National Institute, Lee Maltings, University College Cork, Ireland. http://www.tyndall.ie/posters/highkposter.pdf
 Rusu, Stefan, et al. A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache. ISSCC 2006, Session 5.3.
 Wang, David. IEDM 2005: Selected Coverage. December 30th, 2005.