The Broken Benchmark
The first question that will likely be asked is why I chose to evaluate this benchmark at all. What with all of the accusations and problems that have been found, it’s well known that this is a ‘broken’ benchmark – right? Well, not necessarily. Now, before passing judgement here, let me explain…
Sometime last year, charges were leveled against BAPCo accusing them of being ‘biased’ towards Intel, because their SYSmark 2001 benchmark suite seemed to heavily favor the P4. Later, it was revealed by AMD that one of the applications used in SYSmark 2001 used SSE instructions, but only if an Intel processor was installed. This caused the Athlon XP to appear to be at a disadvantage when compared to the P4. The outcry from many was that SYSmark 2001 was therefore an invalid benchmark that should not be used to measure system performance.
Now, as it turns out, the application in question was Windows Media Encoder 7.0, and that it actually had this code in production – meaning, it is exactly what users of WME 7.0 would experience. Though AMD released an unofficial patch to ‘fix’ this problem, Microsoft elected to wait until the release of version 7.01 to correct it. Therefore, during the period that WME 7.0 was still the only available version, SYSmark 2001 was not putting the Athlon XP at an unfair advantage at all – but what about now?
Before delving into that question, I’ll first reiterate my belief that there are at least two purposes for running a benchmark. One is to show the theoretical maximum performance of a component/system, and the other is to show the actual performance of a component or system in the ‘real world’. Sysmark is intended to do the latter, so it uses real applications that are available to the general public. One individual proposed the argument that if simply patching the application changes the performance rating, then the benchmark must be invalid. After all, the hardware remains exactly the same and it didn’t magically get faster. My contention would be that the system did get faster (though it certainly wasn’t magic).
I believe this point is very important to make. In the past, PC performance was considered synonymous with MHz. Though most knowledgeable people realized this was very simplistic, it was a great marketing tool for Intel. With the release of the Athlon XP, AMD has had great success in overcoming that mentality with their ‘Quantispeed Architecture’ marketing campaign, while advocating that performance actually is MHz and IPC (instructions per clock). Unfortunately, this is still somewhat simplistic, as it seems to imply that performance is only a function of the hardware. In fact, the way the program is written will affect performance overall, and will affect IPC as well.
A PC is a system (as is any computer). A system in general is an assemblage of components (parts) that work together to perform a set of functions. This implies that every part of the system contributes to the final outcome. Since hardware does virtually nothing without software (or firmware), then software (both OS and applications) must be considered when measuring performance. By simply changing a frequently executed routine, it is possible to increase or decrease the performance without changing hardware. Therefore, any true definition of computer performance must include the effects of software. Furthermore, the way an application is written can affect IPC. This could be done with more (or less) efficient memory accesses, or by taking advantage of more efficient instructions for the task (such as using SSE instructions where appropriate). Therefore, changing software can make the system faster without changing hardware.
So, now that WME 7.01 is being shipped, and it will use the SSE code path for both Athlon XP and P4, is SYSmark 2001 an invalid benchmark? I believe it is when benchmarking a system with an Athlon XP processor to show how it compares to other processors using current applications. For benchmarking a system with any other PC processor that is currently available, it might still be a valid benchmark. Furthermore, it still may be valid as a benchmark for those who use the versions of the applications in the suite, regardless of which processor they use – including an Athlon XP.
One other issue I considered is that there are many, many product reviews where Sysmark 2001 was used either by itself or in addition to other benchmarks. I believe it would be useful to see just how the benchmark looks under the same conditions I tested the Winstone benchmarks under. Perhaps this might provide some additional information about how valid this one is for determining the real world performance of a system or component.
Discuss (15 comments)