Cold Hard Cache
An example of a processor feature that reduces the occurrence of power hungry operations is on-chip L2 cache memory. A modern x86 MPU includes a system interface with a 64-bit wide data path that can transfer data at a peak rate of 133 to 400 MHz. The off-chip drivers associated with this interface can consume 3 or 4 Watts of power during periods of high activity . The inclusion of a large and effective on-chip cache hierarchy within an MPU reduces the frequency of memory operations over the system interface because a larger fraction of accesses are satisfied within the cache. For example, doubling the size of the on-chip L2 cache will reduce memory traffic by about 30% on average. Server MPUs like Intel’s Xeon line uses large on-chip caches for this very reason – to reduce system interface traffic. It is important for server MPUs to minimize bus traffic because they are often used in systems where 4 or more processors share common memory resources. In the case of a mobile MPU, the reduced memory traffic offered by a large cache means the system interface is idle a greater fraction of the time and its switching power consumption is reduced. Similarly, generously sized translation look-aside buffers (TLBs) minimize the number of virtual address translations that miss and require table walk accesses to main memory.
A drawback of a large L2 cache is the huge number of transistors that are required. The extra leakage current power associated with a larger cache could rival the switching power saved from reduced system bus traffic. However, there are potential design tricks that can be used to reduce leakage because nearly all of the extra transistors are located within the cache SRAM arrays. An interesting possibility for adding a large amount of cache to a low power MPU while avoiding significantly increasing leakage current is to employ the standard DRAM practice of applying a negative voltage potential, a so-called back bias, to the well (substrate material) the L2 cache SRAM memory cells sit in. The use of a back bias might increase the access time of the L2 cache by a modest amount, but that would generally have only a slight impact on performance.
An even more dramatic possibility would be the inclusion of a DRAM-based L2 or L3 on-chip cache. High density embedded DRAM opens the possibility of very large caches – 8 or 16 MB or more, even in a die size of moderate proportion. The big stumbling block here is the higher wafer cost and potentially lower logic transistor performance associated with adding high density embedded DRAM capability to an MPU process. However given the traditionally large pricing premium that mobile processors command relative to desktop MPUs, the higher wafer cost for a blended logic embedded DRAM process may be acceptable. Although DRAM is often perceived as slow compared to SRAM there are techniques that can yield access and cycle times comparable to the large array, high density SRAM used in L2 and L3 caches even while possibly reducing cache power consumption.
Discuss (78 comments)