Big Willy’s Little Cache
Almost lost among the marketing platitudes of IDF were a few interesting new facts about Willamette. When Intel conceived the new core, they apparently decided that data memory access latency was enemy number one. The centerpiece of their attack on latency is their controversial decision to use a smaller data cache, 8 KB, than in their last four releases of P6 based processors. This also stands in stark contrast to the 64 KB data cache in arch competitor AMD’s K7 Athlon processor line. It is well known that the larger a cache is the greater chance it contains the data item the CPU needs next. As a useful rule of thumb, quadrupling the size of a cache cuts its miss rate in half.
Associativity indicates the number of places in a cache that any particular piece of data could be placed. The more places a data item can go, the less chance that several frequently used pieces of data will fight over the same place(s) in the cache and keep knocking each other out, forcing so-called conflict misses. Associativity also happens to be more effective in reducing the miss rate of small caches than large ones.
Although the Willamette data cache is 1/8th the size of the AMD K7’s data cache, it is more highly associative: 4-way versus 2-way. As a result of these two factors, the miss rate of Willy’s little data cache is only about 2.2 times higher than the much larger data cache in the K7 for most programs run on a PC.
Even though the miss rate disparity between Willamette and K7 isn’t as large as it might seem at first glance, why would Intel use such a small data cache? It definitely isn’t to save chip area. One juicy rumor out of IDF is that the Willamette will come in at a chunky 217 mm2 in die area. Although this is about 30% smaller than the first implementations of the last two new processor cores created by Intel (0.8 um Pentium, and 0.5 um Pentium Pro), it is roughly twice as large as current mainstream x86 processors, like Coppermine Pentium III and T-bird K7/Athlon.
With all this extra area, and more transistors (42 million versus 28 million for coppermine) why not put in a 32, 64, or even 128 KB data cache? The only reason that makes sense is the war on data access latency. Because they wanted their data cache to operate with a 2 cycle latency, Intel could not make it larger than 8 KB in size in their 0.18 um process without placing it on the critical timing path, and thus limiting the processor clock rate (demonstration systems at IDF ran at up to 2 GHz). Larger memory arrays require longer access time, and as a result the larger data cache in the K7 Athlon has a latency of 3 clock cycles.
Be the first to discuss this article!