Benchmarketing 101
OK, here is where I piss off readers, other publications and even manufacturers. I believe that there are way too many hardware publications and authors that just don’t get it. They don’t get it because their readers either don’t get it either. And they don’t care because the manufacturers reward them for not getting it. There are certainly a few publications that do get it, and there are some readers that do as well, but the vast majority do not, in my opinion.
What I am referring to is benchmarks. It seems that far too many people have completely forgotten, or perhaps never understood, the purpose of benchmarks. How many times have we seen motherboards or processors reviewed where at least half of the entire article is benchmark numbers. These numbers are then compared by the authors, manufacturers and consumers to results from previous tests or even from other publications. Though many think this is valid and useful information, the problem here is that subtle differences between the tests, whether hardware or software, can have a measurable effect on the results. This makes attempts to compare individual components very difficult, if not impossible, and always questionable. The only truly valid way to compare components is to keep as many of the other system components the same as is possible, and ensure that BIOS settings and drivers are similarly configured.
One recent trend has been to optimize the source code in various benchmarks for specific processors, then compare it to either optimized or unoptimized code for other processors. This is then used as ‘proof’ that processor A is just as fast or faster than processor B. The fundamental flaw with this tactic is that unless the optimizations are going to be used in an actual commercial product, these comparisons are useless for anything except academic purposes. If the purpose of these comparisons is to show what the end user will actually see in practice using available applications, specially optimized benchmarks will not do this.
A tactic that has always been a concern to me is the selective use of benchmarks. For most commercial publications, the main purpose of their articles is to get readers, and hopefully to inform. Unfortunately, if a comparison between two products or technologies shows no real difference, it is not as exciting as when one can be proclaimed faster or better. Many publications will augment their usual suite of benchmarks with those that will show a difference, allowing them to proclaim a winner. These benchmarks may be real applications (usually games) that have a built-in timed demo, however there is rarely (never?) any information about how popular the application is, and no information about what other applications might have the same resource usage profile. This makes these tests fairly useless for the average user who doesn’t actually use the application tested. Of course, related to this is the practice of proclaiming a ‘clear winner’ when differences are 5% or less – particularly since the margin of error on most benchmarks is several percentage points.
The other practice that I find problem with is the discounting of industry accepted benchmarks as ‘biased’ or irrelevant. For example, the SPEC benchmarks are generally considered the best high-end benchmarks available. They are based upon real applications, and most industry professionals accept the results as an indication of the actual performance of a system in the real world when using those applications. When the P4 was introduced, and the SPEC2000 CPU scores showed it to be very fast in this benchmark, publications and users were very quick to claim that this benchmark has little real world relevance. Winstone Business benchmark numbers are generally discounted because "nobody needs more power for office applications", but the fact that at least 50% of all PCs being used are used in businesses seems to not matter in this argument. BAPCo SysMark2001 has recently come under fire for favoring the P4, and for being ‘in the pocket’ of Intel, though there is yet to be any real evidence of this, only conjecture. Now, this is not meant to imply that the industry benchmarks cannot be improved significantly. In fact, recently an article at Overclockers.com made some excellent comments that pretty much reflect those in my article on this subject from a few years ago.
Unfortunately, most people just do not see these benchmarks as what they are – tools for measuring specific things. If you don’t really know what a particular tool is measuring, how can you use it effectively? I may be able to tell that system A is faster than system B, but the real question is why is it faster, and will that affect me? Right now, most publications put a lot of weight in game benchmarks, and while there is a significant population of gamers, these are by no means the most important applications for the vast majority of PC users – particularly those who use their computers in a professional capacity. If a system runs Quake twice as fast as another, what does that mean to someone who doesn’t play Quake? What other applications have the same resource usage that Quake does? Unless the benchmarks use your applications, in the way that you use them, or there is a way to directly correlate the usage to the application you use, the results are only an approximation of what you will see at best. When those results are within 10% of each other, with a margin of error of 3% to 5%, and they are simply an approximation of what you will see in your usage… what can you conclude? Certainly not that A is ‘clearly better’ or ‘substantially faster’ than B. Furthermore, heavily weighting benchmark scores in the rating of how ‘good’ a product is when the actual value of the results is so nebulous is nothing if not misleading.
Make no mistake about it – most popular publications, whether paper or online, are seen by vendors as opportunities to market products. Marketing people within hardware companies don’t really want anyone to understand the numbers, as a general rule. It is the job of marketing to emphasize the positive, downplay the negative, and if it gives a competitive advantage, mislead without actually lying. Now, they are using eager and barely technical publications to do this job for them! Publications that rely heavily on benchmarks tend to get many products sent to them for ‘evaluation’ because the turnaround on evaluations is quick, and the charts and graphs presented by the publications make great ‘unbiased’ marketing material.
Be the first to discuss this article!