A Better Crystal Ball

Pages: 1 2 3 4 5 6 7 8

Comparing the Model to the Real World

The m parameters shown in Table 3 were used to extrapolate P4 performance at processor clock frequencies of 1.3 GHz and 1.7 GHz. The actual performance, predicted performance, and prediction error for the 26 SPEC2k programs and two clock frequencies are provided below in Table 4.

<b>Table 4 Comparison Between Actual and Predicted Performance</b>

P4/1300

P4/1300

Model

P4/1700

P4/1300

Model

Program

actual

predicted

Error

actual

predict

Error

164.gzip

486

478

-1.6%

625

625

0.1%

175.vpr

276

281

1.9%

319

312

-2.2%

176.gcc

545

557

2.1%

623

614

-1.4%

181.mcf

456

473

3.6%

491

473

-3.8%

186.crafty

434

433

-0.3%

551

561

1.7%

197.parser

424

429

1.2%

517

511

-1.1%

252.eon

563

566

0.5%

748

733

-2.0%

253.perlbmk

615

616

0.2%

780

788

1.0%

254.gap

630

637

1.0%

781

774

-0.9%

255.vortex

658

662

0.5%

812

803

-1.1%

256.bzip2

381

385

1.0%

453

451

-0.4%

300.twolf

375

384

2.4%

429

419

-2.4%

SPECint_base2k

473

478

1.0%

573

567

-1.1%

168.wupwise

680

678

-0.3%

858

836

-2.7%

171.swim

1238

1242

0.3%

1281

1246

-2.8%

172.mgrid

500

501

0.2%

612

611

-0.1%

173.applu

585

618

5.3%

714

660

-8.2%

177.mesa

484

487

0.5%

635

618

-2.8%

178.galgel

496

506

1.9%

579

564

-2.7%

179.art

514

512

-0.4%

525

516

-1.8%

183.equake

685

699

2.0%

791

773

-2.4%

187.facerec

400

402

0.6%

504

497

-1.4%

188.ammp

343

345

0.6%

387

384

-0.8%

189.lucas

700

714

2.0%

836

807

-3.6%

191.fma3d

381

384

0.8%

473

467

-1.3%

200.sixtrack

223

223

0.0%

290

291

0.3%

301.apsi

393

394

0.2%

457

456

-0.2%

SPECfp_base2k

503

508

0.9%

598

585

-2.3%

For most of the individual programs, the performance model seems to be relatively accurate with errors on the order of 2% or less. A few programs like 181.mcf and 173.applu seem not to be amenable to being modeled in this manner. In both cases the model was overly optimistic at the clock rate lower than the test data points, and overly pessimistic at the clock rate higher than the test data points. This suggests that for some reason the model was overestimating the effect of off-chip memory accesses on CPI based on the 1.4 and 1.5 GHz test points.

The model was accurate within the range of 0.9 to 1.1% for the 1.3 GHz SPECint_base2k, the 1.3 GHz SPECfp_base2k, and the 1.7 GHz SPECint_base2k. But for some reason the error is more than twice as large, 2.3%, for the 1.7 GHz SPECfp_base2k prediction. This could imply that the model is too simplistic to accurately model P4 FP performance behavior at higher bus frequency multiplier ratios.

Another possibility is that the architectural component of CPI for FP intensive programs has been improved in the 1.7 GHz version of the P4. It is known that the 1.7 GHz device is a new stepping that reportedly included fixes to a large number of known errata present in the original 1.4 and 1.5 GHz devices. Therefore, it is plausible that there were one or more internal fixes that marginally improved the performance of the 1.7 GHz P4 on FP intensive code compared to earlier devices. The performance vs. frequency curves predicted by the P4 performance model are shown in Figure 1 along with observed performance points.


Figure 1 Graph of Predicted and Observed Performance

Pages: « Prev   1 2 3 4 5 6 7 8   Next »

Be the first to discuss this article!