SPEC CPU2000 Tests (Integer)
According to the SPEC website (http://www.specbench.org/osg/cpu2000/analysis/memory/#text) all of the integer tests have a ‘memory footprint’ larger than 512KB, so this might be a nice complement to all of the relatively small tests shown so far. I tend to believe that the SPEC ‘memory footprint’ is not the same as what I would call the ‘working set size’. The memory footprint is the total amount of memory used in the test, but the largest working set size is the amount of data required at any given time. The difference here is that the working set size determines the effectiveness of the cache.
Note that in these charts you will see no CuMine results. This is because the SPEC tests would not complete on the CuMine processors at 1.2GHz. For these tests, the Microsoft VC7 compiler was used with only the -O2 flag used. The same binary was run on all platforms.
Here we can see that FSB and memory bandwidth has some effect here, as does HW Data Prefetch (PIII T vs. Celeron T). The large working set size can be seen in the difference between the Tbred and Morgan core. This seems to indicate that for this run, the working set size is probably not much larger than 256KB, though the memory footprint is much larger. Note that memory bandwidth does not seem to mean much here, looking at the Willamette scores.
Here is a result that looks very similar to the Dhrystone result. Celeron T seems to have some improvement that this benchmark likes. Other than that, FSB has an impact. Because we have no CuMine scores ata 133MHz to compare, it is difficult to determine if HW Data Prefetch has any effect here. It is possible that HW Data Prefetch could actually hinder the performance (note the PIII T vs. Celeron T), as a software block copy algorithm could ‘fight’ with the Data Prefetch logic and cause more memory accesses to occur. Assuming that is the case, the TLB improvement in Tbred might explain why it performs a bit better than the Tbird at the same FSB speed. The Willamette RDRAM/DDR/SDRAM differences indicate that memory bandwidth helps in this test.
It appears that memory bandwidth is extremely important to this test, as the Willamette with RDRAM is the obvious leader. Once again, we see a test where Willamette does very well at the same clock speed as both P6 and K7 processors. Based upon these results, it appears that this test has a relatively large working set size (more than 128KB, and probably less than 256KB). HW Data Prefetch would not seem to have much impact here, though Tbred’s improved TLB may affect the scores just a bit.
This test seems to be almost entirely dependent upon bandwidth, given the Willamette scores. Note that even the Willamette matched with SDRAM performs better than any of the other architectures. This is a clear indication that the idea that P4 does nothing better than Athlon or PIII at the same clock rate is not entirely correct. Again, HW Data Prefetch may be a negative in this test, but the enhanced TLB of Tbred is a positive. This test is obviously larger than 384MB, as Morgan performs about as well as a Tbred, and better than a Tbird.
We can almost flip the previous chart over and get this one. Willamette makes a very poor showing here, while K7 dominates. The working set size for this test is apparently close to 128KB, considering the Morgan performance. HW Data Prefetch has no impact here, and bandwidth is most certainly not a requirement.
This test would appear to have a working set size larger than 256KB, but smaller than 384KB. HW Data Prefetch is not useful here, but the enhanced TLB of Tbred seems to give some benefit. Willamette performs respectably in this benchmark, so it would appear that P4 is not 100% dependent upon bandwidth to do well, since it bests a Tbird at the same FSB by a small amount.
My first impression is that the working set size for this test is between 32KB and 128KB, since all K7 processors perform equally well, followed by P6 and then Willamette. There is no benefit to any particular memory type, so there is not a lot of memory access happening here, it seems.
Another where HW Data Prefetch provides little benefit, but the enhanced TLB of Tbred appears to be useful. This test looks to have a working set size of between 192KB and 256KB.
These results are very interesting. Large working set size, so both Morgan and Willamette do fairly well (as does the RDRAM equipped Willamette). However, Tbred does much better than Tbird, and the PIII T does a bit better than Celeron T. HW Data Prefetch is beneficial here, but the enhanced TLB of Tbred/Morgan seems provide the greatest impact.
This test would appear to have a working set size that favors a slightly larger cache (256KB), and possibly HW Data Prefetch. Willamette does about as well as the Morgan core when ‘all things are equal’ again.
This test would seem to be similar to 254.gap, in that the Willamette and Morgan do relatively well, and the enhanced TLB of Tbred/Morgan seems to provide a nice peformance benefit. Looking at this and 254.gap, it seems that Willamette has a fairly effective TLB, just as the Tbred/Morgan does.
Willamette peforms very close to both Tbird and Tbred Athlons here. Looking at the relative positions of the different architectures, it seems that HW Data Prefetch is not a benefit, and may be a detriment. Enhanced TLB doesn’t seem to provide any benefit. Bandwidth seems to be important, by looking at the three Willamette scores – but the working set size seems to be small enough to benefit the large-cache K7s yet large enough to favor Willamette when the FSB is equivalent.
Discuss (One comment)