Benchmarking: Art or Science?
Benchmarking has always been closely tied to performance and tuning. Benchmarks originally were the tools of computer architects and system designers to measure the effects of various features and components on performance. The SPEC benchmarks are still geared towards this audience. Eventually, benchmarks began to appear in the toolbox of various computer professionals whose job included recommending system upgrades and/or to monitor and tune system performance within the MIS department.
With the popularity of the PC, and its use in many business situations, benchmarks were developed to provide the necessary tools for those responsible for recommending, implementing and maintaining standalone systems and LANs. These professionals generally have a good understanding of the importance of the various measurements, and how significant any differences between them really in real-world terms. Ziff-Davis (ZDBOp) and BAPCo are two of the most well known of these. Even the now widely-used game demo benchmarks were actually intended for the developers to measure the effects of various code designs, and tune the application.
My previous article on this subject discussed the differences between component and system level benchmarks, and the problems in their popular usage. Unfortunately, the trend in the past several years has been that benchmarks are more and more being used to proclaim one component better than another, even when system level benchmarks are used that cannot accurately measure differences in a single component without stringent controls. For example, processors from Intel and AMD cannot be used on the same motherboard, so the effects of different chipsets, BIOS implementations and drivers must be taken into account before any accurate comparison can be made – but this rarely, if ever, is provided.
One reason that publications, and product marketing groups, can get away with this is that the average person simply doesn’t really understand what benchmarking is all about, and how it relates to performance and tuning. The most basic concept to understand is that when dealing with a system, even small changes can have a measurable effect on the overall system performance. For example, if the memory timings in the BIOS are too aggressive for the modules being used, some signals can be missed causing retries to occur – thereby reducing performance. If the system being compared to it has slightly different timings, it can appear to be a better performer if there are no retries. This can result in someone who does not have the proper tools or training to proclaim processor B ‘better’ than processor A. Therefore, before comparing two systems, they should be tuned to ensure both are running at their optimal performance. Furthermore, when comparing two components, the systems have to be tested to make sure that the component in both systems is the main bottleneck, or else you won’t get a true comparison of their relative performance. This is obviously impossible to do when comparing results from different sources, or even results from the same source performed at different times.
Performance and tuning is the art of finding the biggest bottleneck and relieving it (not removing it), then moving on to the next bottleneck. The fact is that you will never run out of bottlenecks, as there is always a limiting factor for performance. If you are doing your job properly, at some point you will find that the first bottleneck is once again your biggest bottleneck, but you have just encountered it at a different performance level. If this makes no sense to you, then you don’t get it either, and if you are planning on writing or talking about benchmarks, you need to study this paragraph (and other related information) until you do get it.
Benchmarking is the science of determining what the bottleneck is in a system, and then measuring the differences between various changes in the component or resource that causes the bottleneck. You might be able to compare two systems and state one is faster than the other, but until you can identify exactly what the limiting factor is for both systems you really don’t know exactly why one is faster or slower than the other, though you might have some theories about it. Even worse, you might change a factor that is not the limiting factor, seeing little difference in the results, and proclaim that factor as being useless as a method of improving performance.
Be the first to discuss this article!