The next paper in the microprocessor track was from Intel, regarding Merom, or the Core microarchitecture. Merom implements two 4 issue superscalar, out-of-order microprocessors, sharing a 4MB L2 cache and a front-side bus interface in Intel’s high performance 65nm processor. The microarchitecture was previously disclosed at IDF and described in great detail here. Unfortunately, it appears that the ISSCC presentation ran afoul of some rather aggressive marketing staff, and was relatively light in terms of content. The highlight of the presentation was a discussion of the L2 cache.
Merom’s L2 cache is implemented as 1024 4KB sub-arrays, with 16 way associativity. The SRAM bit cells are 0.74um2 each and the cache access time (from when the address arrives to when data is sent out) is 2ns, including tag check, data read, and any error correction. During such an access, the cache only powers up 0.8% of all blocks.
The cache uses sleep transistors to set a virtual Vcc as much as 500mV below the actual Vcc, reducing leakage by 3x. The sleep transistors are also used in what Intel calls cache on demand mode. Essentially, the microarchitecture identifies the least frequently (or perhaps least recently) used cache blocks, and shuts them off, evicting the data to memory. This is a risky technique, as it would be easy to hurt performance and increase power draw (since fetching from memory is very expensive), but reduces leakage by 7x versus normal array operation. All these techniques contribute to an excellent idle power consumption of roughly 380mW/MB.
Figure 2 – Merom Die Micrograph
Discuss (14 comments)