Birth of a New Microarchitecture
Today Intel officially introduced its 7th generation x86 core, their first entirely new design in five years. Given the code named “Willamette” while under development, it will marketed under the name Pentium 4 (P4). This spanking new microprocessor will naturally be compared to the P6 core – the design it is destined to replace and the basis of the Pentium Pro, Pentium II, and Pentium III (PIII). And it will quickly be sized up with respect to its primary competition, the AMD K7 and its anticipated future descendants.
The announced performance of the Willamette P4 measured with the SPEC2000 benchmark suite is shown in Table 1 along with official scores for the Coppermine Pentium III and Thunderbird K7 Athlon. There have been slightly higher (and presumably newer) SPEC scores for the 1.4 and 1.5 GHz P4 disclosed by sources ostensibly violating their non-disclosure agreement (NDA) with Intel but given the uncertain legitimacy of those values I have chosen to use the official P4 performance numbers [1].
CPU | Freq | Chipset | Compiler(s) | Absolute Performance | |||
(MHz) | SPECint2k | SPECfp2k | |||||
peak | base | peak | base | ||||
P4 | 1500 | 850 | IRC 5.0 | 535 | 522 | 558 | 549 |
P4 | 1400 | 850 | IRC 5.0 | 509 | 499 | 538 | 529 |
PIII | 1133 | 820 | IRC 5.0 | 464 | 461 | 331 | 320 |
PIII | 1000 | 840 | IRC 5.0 | 442 | 438 | 335 | 327 |
PIII | 1000 | 820 | IRC 5.0 | 428 | 426 | 314 | 304 |
PIII | 933 | 820 | IRC 5.0 | 410 | 407 | 305 | 295 |
PIII | 867 | 820 | IRC 5.0 | 390 | 388 | 294 | 284 |
PIII | 800 | 820 | IRC 4.5 | 355 | 352 | 256 | 245 |
PIII | 800 | 440BX2 | IRC 4.5 | 344 | 340 | 237 | 226 |
PIII | 733 | 840 | IRC 4.5 | – | 336 | – | 243 |
PIII | 733 | 820 | IRC 4.5 | 335 | 331 | 244 | 234 |
PIII | 750 | 440BX2 | IRC 4.5 | 330 | 325 | 230 | 219 |
PIII | 700 | 440BX2 | IRC 4.5 | 315 | 310 | 223 | 213 |
PIII | 667 | 820 | IRC 4.5 | 314 | 310 | 233 | 222 |
PIII | 650 | 440BX2 | IRC 4.5 | 299 | 295 | 215 | 204 |
K7 | 1200 | GA-7ZM | IRC 4.5/Compaq 6.5 | – | – | 342 | 304 |
K7 | 1100 | GA-7ZM | IRC 4.5/Compaq 6.5 | – | – | 331 | 311 |
The 1.4 GHz P4 achieves 19% higher SPECint2000 performance than a 1000 MHz PIII on an 820 based platform, while the 1.5 GHz P4 achieves 25% higher performance. Considering that the P4 is clocked 40% or 50% faster than the PII, this would at first glance seem to confirm the concern that the P4’s deep pipelining and small 8 KB data cache would cause a significant instruction per clock cycle (IPC) penalty compared to the PIII. Does this mean the P4 is a “bad design” or that Intel has cut corners in their new core in a blatant attempt to trick computer buyers on the basis of high clock rates? Simple comparison of performance and clock rate cannot be used to support those contentions, as clock normalization is not a valid way to compare two microarchitectures operating at different clock frequencies.
For starters, SPEC2000 was designed to have a much larger memory footprint than SPEC95. As a result, memory accesses miss in the 256 KB L2 cache found in both the P4 and PIII in significant numbers, and have to be satisfied by read and/or write operations to main memory. As the clock rate of a PIII or P4 processor is increased, the number of processor cycles needed to access main memory (which doesn’t speed up) increases. An average memory access might take a 100 ns or more. That translates to 100 clock cycles on a 1.0 GHz PIII and 150 clock cycles on a 1.5 GHz P4. That is fair for looking at absolute processor performance because after all, that is how the processors are intended to run. But if you are trying to compare the design efficiency of the two different microarchitectures you wouldn’t run them at the same frequency and then connect the second to memory 50% slower than the memory connected to the first. That is effectively what you are doing when you compare the performance divided by clock rate (i.e. clock normalized) of two designs with one run at 1.0 GHz and the other at 1.5 GHz.
Discuss (10 comments)