Benchmark Usage

All of the preceding information leads up to the most important questions of all – what can these benchmarks tell us, and what is the proper usage of them? eTesting Labs claims that the margin of error is about 3% for the Winstone benchmarks when run as suggested, which is to run at least 3 times and defragment and reboot between runs. eTesting Labs prefers to use the highest score of any set of runs to indicate what the best possible performance is from the system being tested. While scores from an individual run might differ by more than 3%, the top scores for any set of runs should fall within the 3% specified. The reason that scores may differ by 5% or more on individual runs is because of the way Windows manages memory and tasks. Task dispatching and swapping may differ quite a bit between runs.

There are other philosophies about the ‘best’ method of reporting scores. My own preference is to make 5 runs, throw out the highest and lowest scores, and then average the three remaining. This would seem to provide the best indication of the average performance that a population of users would experience, though I am sure others might disagree.

It is important to understand that the intent of these benchmarks is to show differences between complete systems, not between individual components, such as processors, hard drives and graphics cards. While it should be possible to make such comparisons given the proper controls in the tests, it won’t be as simple as just running one platform against another and attributing the difference in scores to one component.

If the only difference between two systems is a single component (such as just the processor or hard drive), then a direct comparison can be made, keeping in mind that individual scores can vary by 5% or more even with exactly the same components. However, most comparisons involve multiple component variations – such as chipset and processor, or chipset and memory. Even two motherboards using the same chipset cannot be compared directly unless all BIOS settings are verified as being exactly the same – particularly with regards to memory and I/O timings. Aggressive BIOS settings can contribute as much as 5% or more to the overall performance vs. the default settings, depending upon what the manufacturer has implemented.

The usage of the Winstone benchmarks becomes very questionable when two platforms using different motherboards, chipsets, processors and memory are compared, with the intent of only showing the differences between processors (or any other single component, such as memory). The benchmarks simply do not have the ability for anyone to make such conclusions without other supporting data – such as WinBench results, which attempt to isolate the performance of each specific component. Even then, it could be difficult or impossible to determine exactly which component is contributing to the overall score without knowing how each component contributes under very controlled circumstances.

