Data Prefetch Logic – What is it Worth?

Pages: 1 2 3 4 5 6 7 8 9

Memory Benchmarks

In order to determine whether bandwidth issues would limit the ability of the Data Prefetch logic to ‘work its magic’, I ran several memory specific diagnostics. The most important one is STREAM, which attempts to determine the highest bandwidth utilized in an application that simply streams data through the memory subsystem as fast as possible. I specified 10 iterations for each stream run to get the best score possible.

Processor

PIII Cu

PIII T

PIII Cu

CeleronT

FSB

133MHz

133MHz

100MHz

100MHz

Copy

304.238

389.617

298.593

292.2026

Scale

304.3049

388.2476

298.1941

293.0291

Add

372.1173

482.2688

361.1468

371.0431

Triad

371.8431

480.3588

361.6824

360.6214

As can be seen here, none of these tests even comes close to reaching the maximum theoretical bandwidth of the GTL+ bus, and in fact, don’t even reach 50% of it. It also shows that the Celeron T and PIII Coppermine, both running at 100MHz FSB, get almost identical scores, while the PIII T gets significantly higher bandwidth utilization than the PIII Copermine running at 133MHz FSB. Data Prefetch logic in action! Of course, this is a synthetic test, so it only shows the maximum potential benefit of the feature, not the typical benefit. Still, almost a 30% improvement is quite impressive for one little feature.

Another benchmark that looks interesting is PCMark2002. The memory tests seem to isolate the speed of memory and cache, but the Block Read numbers here seem awfully high – very close to the theoretical bandwidth of the bus. I will be interested in hearing feedback about the throughput shown here (is it possible to shove this much data across the bus?). The jury is still out as to whether this is a valid benchmark for comparing across platforms, but a comparison of these architecturally similar processors yields some interesting results. I specified 10 iterations of the tests selected, and PCMark2002 presents an average score for the 10 runs. The numbers presented here are all in MB/s.

Processor

PIII Cu

PIII T

PIII Cu

Celeron T

FSB

133MHz

133MHz

100MHz

100MHz

Block Read – 3072KB

942.00

869.40

737.00

737.60

Block Read – 1536KB

942.10

871.00

737.00

737.50

Block Read – 384KB

941.90

987.10

736.60

738.00

Block Read – 48KB

4861.60

4558.10

4873.80

4981.80

Block Read – 6KB

8838.30

8891.30

8863.00

8913.80

Block Write – 3072KB

214.60

219.40

167.10

162.90

Block Write – 1536KB

218.10

215.90

173.30

167.60

Block Write – 384KB

411.20

368.50

333.60

323.30

Block Write – 48KB

4292.40

4054.30

4303.10

4064.70

Block Write – 6KB

7904.10

7904.10

7924.00

7924.00

Block Modify – 3072KB

224.30

217.30

163.40

166.10

Block Modify – 1536KB

223.70

223.40

162.70

171.50

Block Modify – 384KB

331.10

361.00

261.40

275.40

Block Modify – 48KB

4057.00

3596.10

3745.90

3414.70

Block Modify – 6KB

5059.00

5059.20

5072.00

5072.20

Random Access – 1536KB

430.10

651.10

360.80

359.90

Random Access – 768KB

430.20

660.40

360.80

360.00

Random Access – 384KB

430.20

739.50

360.30

359.70

Random Access – 96KB

2526.20

2473.90

2535.20

2476.20

Random Access – 48KB

2529.80

2468.90

2539.40

2476.20

Random Access – 6KB

3196.90

3197.60

3205.60

3205.80

The first strange thing to notice is that the Coppermine part running at 133MHz FSB gets a higher throughput in the non-cached tests (above 256KB blocks) than the Tualatin part, so either there is something odd about this particular test (and benchmark) or the Data Prefetch logic is actually detrimental in this circumstance. Obviously more research and discussion is necessary here. As can be seen in all of the ‘small block’ tests (under 256KB), the L1 cache speeds are almost identical, but once again the Coppermine seems to have a slight advantage in L2 speed. Obviously the Data Prefetch logic will have no impact when data is accessed from cache, but it does appear that L2 cache is a bit slower on the Tualatin parts.


Pages: « Prev   1 2 3 4 5 6 7 8 9   Next »

Be the first to discuss this article!