Willamette Performance Revealed

Pages: 1 2 3 4 5

P4 vs PIII: Optimized Integer Performance and Scalability Characteristics

So how do you more accurately assess the P4 performance relative to its predecessor? You have to extrapolate how the PIII would hypothetically perform if it could be clocked at 1.4 or 1.5 GHz. Intel can do this by cranking up the clock rate of their cycle accurate logic model of PIII and run the SPECint2000 code on it. The best that outside observers can hope for is to examine how PIII performance varies with frequency from published results. To see how well performance scales with increasing clock rate you have to have performance data for the processor at different frequencies but with other factors held the same. This means the chipset, memory speed grade, and compiler must be held constant and only the frequency changed. By dividing the percent increase in SPECInt2000 performance by the percent increase in clock frequency between the two data points you can obtain an approximate scaling factor. This procedure is shown in Table 2 for SPECint2000.

Table 2 Derivation of SPECint2000 Performance/Frequency Scaling Factors

CPU

Freq

Chipset

Compiler(s)

Absolute Performance

Frequency

Increase

Performance Increment

Perform. Scaling with Freq

(MHz)

SPECint2k

SPECint2k

SPECint2k

P4

1500

850

IRC 5.0

535

7.1%

5.1%

72%

P4

1400

850

IRC 5.0

509

PIII

1133

820

IRC 5.0

464

13.3%

8.4%

63%

PIII

1000

820

IRC 5.0

428

7.1%

4.4%

61%

PIII

933

820

IRC 5.0

410

7.7%

5.1%

67%

PIII

867

820

IRC 5.0

390

PIII

800

820

IRC 4.5

355

9.1%

6.0%

66%

PIII

733

820

IRC 4.5

335

10.0%

6.7%

67%

PIII

667

820

IRC 4.5

314

PIII

800

440BX2

IRC 4.5

344

6.7%

4.2%

64%

PIII

750

440BX2

IRC 4.5

330

7.1%

4.8%

67%

PIII

700

440BX2

IRC 4.5

315

7.7%

5.4%

70%

PIII

650

440BX2

IRC 4.5

299

Notice that I only calculated scaling factors for data points with the same chipset and compiler version. For example, it is invalid to attempt to compare the PIII at 867 MHz on an 820 platform with the PIII at 800 MHz on an 820 platform, because the former used version 5.0 of the Intel Reference Compiler while the latter used version 4.5. In Table 2 it can be seen that when the clock rate of the PIII was increased by 13.3%, going from 1000 MHz to 1133 MHz, the SPECint2000 score on the 820 platform using IRC 5.0 increased by 8.4%. That means that the PIII performance in the 1000 to 1133 GHz range improves by 63% of the increase in frequency. Therefore the performance/frequency scaling factor is 63%.

We can extrapolate the 1133 MHz PIII SPECInt2000 score to 1.4 GHz by multiplying the 23.5% increase in clock frequency by 63% to get an approximate SPECint2000 increase of 14.8%. That increase would give a SPECint2000 score of 532 for a hypothetical 1.4 GHz PIII. The 1.4 GHz P4 actually scored 509 so at equal clock rates (1.4 GHz) the straight line extrapolated PIII is about 4.5% faster. That is actually overly optimistic for the PIII, since the frequency scaling factor wouldn’t be a straight line from 1133 MHz to 1400 MHz, but instead would gradually diminish with increasing frequency, so the two processors would likely be even closer to parity at 1.4 GHz. This seems to dismiss concerns some observers raised that the extreme pipeline depth and small data cache size of the P4 would seriously penalize its integer performance relative to the P6 core.

I have plotted the official submitted SPECint2000 performance of PIII with various platforms and compilers along with the disclosed P4 performance in the graph in Figure 1 to show the scaling factor slopes and straight line PIII performance extrapolation.


Figure 1 SPECint2000 Performance of PIII and P4 as a Function of Clock Rate

So, on an apple to apple comparison basis, so to speak, the P4 is ostensibly a bit less efficient than the PIII on integer code. That doesn’t sound like much of an achievement for an MPU with 42 million transistors and twice the die size as PIII. However, there are two important factors that must be remembered:

  1. The P4 can physically operate at 1.4 and 1.5 GHz and beyond in 0.18 um while the PIII seems to be limited to a little over 1 GHz
  2. The P4 performance frequency scaling factor is higher: 72% versus less than 63% (remember the flattening effect). This means that relative to a hypothetical PIII design that could keep up in frequency, the P4 looks better and better, and eventually exceeds it in “IPC” as the clock rate heads towards the 2 GHz that Intel claims is possible in their 0.18 um process and beyond in 0.13 um.

It also should be noted that integer performance achieved on new microarchitectures typically improves for several years after introduction as compiler writers learn how to take maximum benefit of the design. In the case of the P4, that could mean better exploitation of the 32 and 64 bit integer SIMD instruction set extension. The P4 might eventually be seen as a modest IPC improvement over the P6 core on integer code. However keep in mind that the design goal is maximum performance, not maximum IPC, and by far the strongest input to achieving higher performance on scalar code and code with poor instruction level parallelism (ILP) is higher clock rate.


Pages: « Prev   1 2 3 4 5   Next »

Discuss (10 comments)