I could find no information from BAPCo indicating what the margin of error might be for SYSmark 2001. During my testing for this article I found that when performing an ‘Official Run’ with more than a single benchmark run (it allows up to 3), no aggregate score would be calculated if any of the runs were more than 5% different.
Unlike the method used by eTesting Labs to determine the final score (the highest of the 5 runs), BAPCo uses the average of the runs. The White Paper describes it as follows: “The overall response time for a scenario is the average of all the response times in all the applications that make up that scenario. The average response time for each of the two scenarios is then converted to ratings. The overall SYSmark 2001 rating is derived from the geometric mean of the two scenario ratings”. The method used for the rating is based upon a reference platform (described as a fixed calibration platform in the White Paper), which is given the value of 100.
As with the Winstone benchmarks, the intent of SYSmark 2001 is to show differences between complete systems, not between individual components, such as processors, hard drives and graphics cards. Unfortunately, as I mentioned in the previous section, it does not appear that it actually achieves that goal because it essentially eliminates OS overhead from the measurement. Therefore, my assessment is that it actually only measures response time for various types of application activities and not overall system performance. It may be argued that this is what is important to the user, but my own experience indicates that OS overhead is also part of the user experience. Excluding this overhead will give misleading information about the effect of various amounts of memory installed in the system. We can investigate the truth of this in the next section.
Discuss (15 comments)