A Quick Note on SysMark2002
Note: This section was written shortly before the recent news publicly broke about SYSmark 2002 being ‘skewed’ towards the P4. I decided to leave it in as is rather than get in the middle of a difficult controversy that would cause this article to spin off into a direction I don’t care to go.
One of the benchmarks I have used for this set of articles is BAPCo’s SYSmark2002. The results and more detailed analysis will be presented in the third installment, but I will say right now that they are not in line with the results from the other benchmarks used. I believe I know the reason for this, and it has a lot to do with the way that the individual tasks are weighted, which is by actual ‘run time’. A task that takes a long time to complete will have a much greater ‘weight’ in the final score – which generally means tasks that require a lot of memory access. That means platforms that have a high memory bandwidth will be favored, which means P4 looks pretty good. Note that there are users who require high bandwidth, so the results are not necessarily wrong – but very possible cannot be called ‘representative’ of what most people will see.
Recently, I’ve received some information that seems to validate this theory, and someone else may publish details at some point in the near future (as there are likely others who have seen the information). I am working to independently verify the data, but that will take some time. One point that I believe should be made is that the real value of a benchmark is in how well it emulates ‘real world’ usage. The pitfall that many fall into is that they believe benchmarks should only emulate the most common usage, which means that by definition there will be a percentage of users whose usage will not be represented. This can also be considered a sort of ‘bias’, so I believe that if a benchmark represents the usage of even a relatively small percentage of users, it is valid to use – as long as the limitations are clearly identified.
As an example, if we take a ‘worst case’ scenario, and claim that the SYSmark2002 usage applies to only a few percent of users, we also have to admit that for those few users it is a valid benchmark. There are some who have called for the ‘elimination’ of SYSmark as a benchmark because of alleged wrongdoing by BAPCo. My personal preference is to try and evaluate the value of something based upon what it does, not based upon the intentions of those who created it. If each benchmark is evaluated for what it actually measures, every user can decide whether it applies to him/her. This way, the ‘bad’ benchmark maker will either have to change to appeal to a wider audience, or risk being relegated to an unimportant niche role in system evaluations. At least, that is the theory and my goal.
Discuss (One comment)