As I surmised at the beginning of this article, the methodology used has seemingly all but eliminated system overhead from the measurements. As a result, I view this benchmark as a test of application performance, not system performance. However, since the results are not broken down by application, it limits the usefulness severely – by such a degree, in fact, that I can’t currently recommend it for comparing systems or components. I would further like to suggest that any reviews that depend heavily upon the results of SYSmark 2001 to show what impact various component changes will have on system performance are going to be somewhat misleading. If the reviewer was not aware of the methodology used, and the implications of that, the conclusions are very likely going to be flawed.
To be as fair as possible, I have looked for situations where the methodology used would be useful, rather than misleading, but I must admit that I haven’t been able to identify any. To further emphasize the main problem I see, let us assume (for sake of argument) that for the typical user who is multitasking in the way that this benchmark implements it, that system overhead is 10%. This may be too high, but makes it simpler to point out the problem. Let us further assume that the amount of time waiting for operations to complete by subsystem is:
- Processor – 25%
- Memory access – 10%
- Disk I/O – 35%
- Graphics rendering – 20%
Including the system overhead (10%), we have 100% of the user wait time. If we remove system overhead from the performance equation, we have to take that 10% and assign it to these subsystems in the same ratio, therefore the percentages look like this in order to get back to 100% of the measured time:
- Processor – 27.5%
- Memory access – 11%
- Disk I/O – 38.5%
- Graphics rendering – 22%
This essentially makes a component upgrade look better than it would really be in real world usage, and I am reasonably sure that marketing people don’t mind too much – but end users most likely don’t appreciate it. It may be possible to add the system overhead back in and adjust the performance ratings to a more realistic level, but measuring the overhead might be more trouble than it is worth, and the resulting ‘scores’ would not be condoned by BAPCo or manufacturers.
I also considered that the benchmark might be a good indicator of the improvement one might see for individual applications, such as for someone who might typically spend the majority of his/her time using Photoshop or compiling with little else going on. Unfortunately, since there are no individual application scores provided it is impossible to determine how the overall improvement will relate to any specific application the user might be interested in.
It is possible that I have not really understood the methodology used, or that I have missed the reason why this methodology is preferable to one that includes system overhead – and I encourage anyone with thoughts on this to give me some feedback. In the meantime, I would have to caution reviewers and readers of the results to keep these issues in mind when making any conclusions about them.
In the end, I would have to say that SYSmark 2001 is probably not the best benchmark available for comparing either systems or components, but not for the reasons that seem to be popular. I think that many people have been looking for reasons to discount the results without really understanding how it works, just because it hasn’t favored their favorite manufacturer. This is a lousy reason to not like a benchmark, IMO. Perhaps the information provided here will at least give those who don’t like the benchmark a truly good reason to not like it. Personally, I started this analysis with the belief that the results would prove that SYSmark 2001 is reasonably well designed, but the apparent facts have forced me to conclude otherwise. In the future, I hope that supporters and detractors of various benchmarks will base their conclusions upon reason and facts rather than emotional likes or dislikes due to perceived affiliations with manufacturers. As I mentioned, I am sure that every member of BAPCo that manufactures systems or components would be happy that any benchmark makes an upgrade look a few percentage points better than it will under real world usage. Since they all have the opportunity to review and approve the methodology and test scripts (and each has an equal vote), it wouldn’t be fair to single any one of them out as the ‘bad guy’…
Discuss (15 comments)