As most who have some technical background certainly understand, there isn’t a single feature or implementation that is best for all circumstances when it comes to performance. Even within a single application, the attributes of the data (type, size, etc.) will have some effect on performance, and can make a difference between which processor, memory, hard drive, etc. is ‘best’ for a specific user. The results of this analysis should be a clear indication of these even for those who, like myself, are not trained in computer architecture.
The designers of the P6, P7 and K7 architectures all took different approaches to achieve performance by focusing on different features. The modifications to each core over time is an attempt to make adjustments to maximize performance within the constraints of the original design. It would seem that the HW Data Prefetch feature has some value, but only under a limited set of circumstances. The enhanced TLB seems to have been applicable to a wider variety of circumstances. The issue that CPU designers must face is that these features must not throw a design ‘out of balance’. For example, a TLB that is not matched well with the cache implementation could cause additional ‘wait time’ during cache accesses, while an FPU that cannot be ‘fed’ fast enough due to bandwidth limitations cannot ever be fully utilized.
Despite many claims otherwise, it seems to me that both PCMark2002 and SPECint provide a great deal of useful information about the relative design strengths of different processor architectures if one takes the time to understand what is being measured. Since applications all utilize resources differently, it is extremely important that any benchmark trying to be ‘representative’ of real world usage should emphasize the various resources in different tests. It is then up to the individual analyzing the results to make the connections between his/her usage and the results of the various tests, so the best possible solution for the specific situation can be determined. Both PCMark2002 and SPECint seem to provide this type of data, but it does require some work and intelligence to apply it properly.
There are a number of issues in benchmarking that seem to be generally overlooked by many enthusiasts and reviewers. Foremost is that without understanding what is being measured, it isn’t possible to come to any useful conclusions. The intent of this type of analysis is to look carefully at each benchmark, run it under very controlled circumstances, and make a reasoned analysis of it based upon what is known. At that point, a theory can be developed regarding what that means to end users, and then tested with future scenarios and platforms to validate it or refine it.
I encourage all who look at this data to critically analyze it and come to an independent conclusion on what each test means, and provide some feedback so that I may be able to correct or refine some of the comments about each. The next installment will look at the memory specific tests in PCMark2002 and SiSoft Sandra, as well as STREAM. The final installment will look at the system level benchmarks like Winstone, SYSmark and some of the ‘overall’ scores from the other benchmarks that purport to show ‘real world’ system performance. Follow on articles will include additional benchmarks and processor clock speeds to show scaling and effects of other design improvements. I will also be making available the Excel spreadsheet of all results available for download to members, and will be posting the SPEC results and configuration files for public download shortly.
Discuss (One comment)