Memory Benchmarks
In order to determine whether bandwidth issues would limit the ability of the Data Prefetch logic to ‘work its magic’, I ran several memory specific diagnostics. The most important one is STREAM, which attempts to determine the highest bandwidth utilized in an application that simply streams data through the memory subsystem as fast as possible. I specified 10 iterations for each stream run to get the best score possible.
Processor |
PIII Cu |
PIII T |
PIII Cu |
CeleronT |
FSB |
133MHz |
133MHz |
100MHz |
100MHz |
Copy |
304.238 |
389.617 |
298.593 |
292.2026 |
Scale |
304.3049 |
388.2476 |
298.1941 |
293.0291 |
Add |
372.1173 |
482.2688 |
361.1468 |
371.0431 |
Triad |
371.8431 |
480.3588 |
361.6824 |
360.6214 |
As can be seen here, none of these tests even comes close to reaching the maximum theoretical bandwidth of the GTL+ bus, and in fact, don’t even reach 50% of it. It also shows that the Celeron T and PIII Coppermine, both running at 100MHz FSB, get almost identical scores, while the PIII T gets significantly higher bandwidth utilization than the PIII Copermine running at 133MHz FSB. Data Prefetch logic in action! Of course, this is a synthetic test, so it only shows the maximum potential benefit of the feature, not the typical benefit. Still, almost a 30% improvement is quite impressive for one little feature.
Another benchmark that looks interesting is PCMark2002. The memory tests seem to isolate the speed of memory and cache, but the Block Read numbers here seem awfully high – very close to the theoretical bandwidth of the bus. I will be interested in hearing feedback about the throughput shown here (is it possible to shove this much data across the bus?). The jury is still out as to whether this is a valid benchmark for comparing across platforms, but a comparison of these architecturally similar processors yields some interesting results. I specified 10 iterations of the tests selected, and PCMark2002 presents an average score for the 10 runs. The numbers presented here are all in MB/s.
Processor |
PIII Cu |
PIII T |
PIII Cu |
Celeron T |
FSB |
133MHz |
133MHz |
100MHz |
100MHz |
Block Read – 3072KB |
942.00 |
869.40 |
737.00 |
737.60 |
Block Read – 1536KB |
942.10 |
871.00 |
737.00 |
737.50 |
Block Read – 384KB |
941.90 |
987.10 |
736.60 |
738.00 |
Block Read – 48KB |
4861.60 |
4558.10 |
4873.80 |
4981.80 |
Block Read – 6KB |
8838.30 |
8891.30 |
8863.00 |
8913.80 |
Block Write – 3072KB |
214.60 |
219.40 |
167.10 |
162.90 |
Block Write – 1536KB |
218.10 |
215.90 |
173.30 |
167.60 |
Block Write – 384KB |
411.20 |
368.50 |
333.60 |
323.30 |
Block Write – 48KB |
4292.40 |
4054.30 |
4303.10 |
4064.70 |
Block Write – 6KB |
7904.10 |
7904.10 |
7924.00 |
7924.00 |
Block Modify – 3072KB |
224.30 |
217.30 |
163.40 |
166.10 |
Block Modify – 1536KB |
223.70 |
223.40 |
162.70 |
171.50 |
Block Modify – 384KB |
331.10 |
361.00 |
261.40 |
275.40 |
Block Modify – 48KB |
4057.00 |
3596.10 |
3745.90 |
3414.70 |
Block Modify – 6KB |
5059.00 |
5059.20 |
5072.00 |
5072.20 |
Random Access – 1536KB |
430.10 |
651.10 |
360.80 |
359.90 |
Random Access – 768KB |
430.20 |
660.40 |
360.80 |
360.00 |
Random Access – 384KB |
430.20 |
739.50 |
360.30 |
359.70 |
Random Access – 96KB |
2526.20 |
2473.90 |
2535.20 |
2476.20 |
Random Access – 48KB |
2529.80 |
2468.90 |
2539.40 |
2476.20 |
Random Access – 6KB |
3196.90 |
3197.60 |
3205.60 |
3205.80 |
The first strange thing to notice is that the Coppermine part running at 133MHz FSB gets a higher throughput in the non-cached tests (above 256KB blocks) than the Tualatin part, so either there is something odd about this particular test (and benchmark) or the Data Prefetch logic is actually detrimental in this circumstance. Obviously more research and discussion is necessary here. As can be seen in all of the ‘small block’ tests (under 256KB), the L1 cache speeds are almost identical, but once again the Coppermine seems to have a slight advantage in L2 speed. Obviously the Data Prefetch logic will have no impact when data is accessed from cache, but it does appear that L2 cache is a bit slower on the Tualatin parts.
Pages: « Prev 1 2 3 4 5 6 7 8 9 Next »
Be the first to discuss this article!