Conclusion
It is readily apparent that the design innovations employed in Intel’s next generation x86 processor core are not restricted to the trace cache and double frequency ALUs. The 0.18 um Willamette has already been demonstrated running at clock frequencies up to 2.0 GHz. In order for it to achieve a 2 cycle data cache access latency at such high clock rates, it likely employs a non-conventional data cache design similar to the one I proposed. These techniques seem to offer at least the same, but more likely better, frequency headroom than a conventional 3 cycle data cache. The downside is the addition of a new class of faults, called way mispredictions, which can potentially sap IPC if they occur too frequently.
But the main design trade-off for achieving 2 cycle cache access in a high frequency MPU design like Willamette is readily apparent to the most casual of microprocessor observers, its minuscule 8 KB sized data cache. It is unfortunate that many people, including some technically literate individuals that should know better, are already mistakenly equating the raw size of the data cache as some kind of ‘manhood’ test of processor virility. Besides needing to review basic computer engineering principles taught to undergraduates, they should also consider the graphs in Figure 7. These show integer and floating point performance versus data cache size for the best-of-class processors employing on-chip caches from 7 different architectures in the Spring of 1998. Technology has changed so much since then that no implications can be drawn with respect to Willamette. However I included it as a gentle reminder that bigger isn’t always better.

Figure 7. Integer and FP Performance versus Data Cache Size, 1998
Footnotes
Be the first to discuss this article!