Comparing the Model to the Real World
The m parameters shown in Table 3 were used to extrapolate P4 performance at processor clock frequencies of 1.3 GHz and 1.7 GHz. The actual performance, predicted performance, and prediction error for the 26 SPEC2k programs and two clock frequencies are provided below in Table 4.
P4/1300 |
P4/1300 |
Model |
P4/1700 |
P4/1300 |
Model | |
Program |
actual |
predicted |
Error |
actual |
predict |
Error |
164.gzip |
486 |
478 |
-1.6% |
625 |
625 |
0.1% |
175.vpr |
276 |
281 |
1.9% |
319 |
312 |
-2.2% |
176.gcc |
545 |
557 |
2.1% |
623 |
614 |
-1.4% |
181.mcf |
456 |
473 |
3.6% |
491 |
473 |
-3.8% |
186.crafty |
434 |
433 |
-0.3% |
551 |
561 |
1.7% |
197.parser |
424 |
429 |
1.2% |
517 |
511 |
-1.1% |
252.eon |
563 |
566 |
0.5% |
748 |
733 |
-2.0% |
253.perlbmk |
615 |
616 |
0.2% |
780 |
788 |
1.0% |
254.gap |
630 |
637 |
1.0% |
781 |
774 |
-0.9% |
255.vortex |
658 |
662 |
0.5% |
812 |
803 |
-1.1% |
256.bzip2 |
381 |
385 |
1.0% |
453 |
451 |
-0.4% |
300.twolf |
375 |
384 |
2.4% |
429 |
419 |
-2.4% |
SPECint_base2k |
473 |
478 |
1.0% |
573 |
567 |
-1.1% |
168.wupwise |
680 |
678 |
-0.3% |
858 |
836 |
-2.7% |
171.swim |
1238 |
1242 |
0.3% |
1281 |
1246 |
-2.8% |
172.mgrid |
500 |
501 |
0.2% |
612 |
611 |
-0.1% |
173.applu |
585 |
618 |
5.3% |
714 |
660 |
-8.2% |
177.mesa |
484 |
487 |
0.5% |
635 |
618 |
-2.8% |
178.galgel |
496 |
506 |
1.9% |
579 |
564 |
-2.7% |
179.art |
514 |
512 |
-0.4% |
525 |
516 |
-1.8% |
183.equake |
685 |
699 |
2.0% |
791 |
773 |
-2.4% |
187.facerec |
400 |
402 |
0.6% |
504 |
497 |
-1.4% |
188.ammp |
343 |
345 |
0.6% |
387 |
384 |
-0.8% |
189.lucas |
700 |
714 |
2.0% |
836 |
807 |
-3.6% |
191.fma3d |
381 |
384 |
0.8% |
473 |
467 |
-1.3% |
200.sixtrack |
223 |
223 |
0.0% |
290 |
291 |
0.3% |
301.apsi |
393 |
394 |
0.2% |
457 |
456 |
-0.2% |
SPECfp_base2k |
503 |
508 |
0.9% |
598 |
585 |
-2.3% |
For most of the individual programs, the performance model seems to be relatively accurate with errors on the order of 2% or less. A few programs like 181.mcf and 173.applu seem not to be amenable to being modeled in this manner. In both cases the model was overly optimistic at the clock rate lower than the test data points, and overly pessimistic at the clock rate higher than the test data points. This suggests that for some reason the model was overestimating the effect of off-chip memory accesses on CPI based on the 1.4 and 1.5 GHz test points.
The model was accurate within the range of 0.9 to 1.1% for the 1.3 GHz SPECint_base2k, the 1.3 GHz SPECfp_base2k, and the 1.7 GHz SPECint_base2k. But for some reason the error is more than twice as large, 2.3%, for the 1.7 GHz SPECfp_base2k prediction. This could imply that the model is too simplistic to accurately model P4 FP performance behavior at higher bus frequency multiplier ratios.
Another possibility is that the architectural component of CPI for FP intensive programs has been improved in the 1.7 GHz version of the P4. It is known that the 1.7 GHz device is a new stepping that reportedly included fixes to a large number of known errata present in the original 1.4 and 1.5 GHz devices. Therefore, it is plausible that there were one or more internal fixes that marginally improved the performance of the 1.7 GHz P4 on FP intensive code compared to earlier devices. The performance vs. frequency curves predicted by the P4 performance model are shown in Figure 1 along with observed performance points.

Figure 1 Graph of Predicted and Observed Performance
Pages: « Prev 1 2 3 4 5 6 7 8 Next »
Be the first to discuss this article!