Validity of My Own Methodology
A few readers also questioned my own methodology or my conclusions. One of the criticisms leveled was that most users spend the majority of their time surfing the Web or downloading email, therefore, the speed of the internet connection is the most important factor in overall system performance. While I cannot dispute the claim that most home users are simply surfing the ‘Net, I would be a bit skeptical of this claim for a work environment, which is what all of these benchmarks are meant to emulate. Many companies severely restrict Internet access for their employees, and some don’t allow it at all (or simply have no facility to provide it). To further expand this concept, most work environments (and many home environments as well) include network access, which can also significantly affect peformance. The point of these benchmarks, however, is not to measure the performance of the communications connection (though there are tools to measure this), but the system itself.
Another comment made by several readers is that most users don’t really multitask at all. Multiple applications may actually be open, but generally only one is used 80% of the time, and switching to the another one is relatively rare. As I mentioned earlier, system usage does depend on the individual, and also likely depends upon the situation. When at work on my ‘day job’, I tend to use a 3270 emulator the majority of the time, my browser another 10% or so as I do research or catch up on news, and switch to my email client perhaps 10 times during the day. Even more rarely will I switch to Word, Excel or another office type application. This tends to support the theory that task switching is a very minor part of the user experience. On the other hand, when I am at home working on this website, or some other personal project, I will switch between my email, browser, Word, Excel, etc. very frequently while I write articles, do research or download data sheets, etc. I also will use Corel Draw when editing articles submitted with graphics, and will often have either my FTP program in the background downloading/uploading files or will have Analog running to analyze my server logs. One of the reasons I will have so many tasks active is that my time at home is limited to a few hours a day, so I need to get many things accomplished in a short period of time. Based upon this, I believe that there is a need for measuring both types of usage, as I suggested at the beginning of this article. Both of the benchmark suites mentioned here should provide the ability to test with either scenario (or both) if they are to truly provide an accurate representation of the user experience, in my opinion.
A few readers mentioned that my use of only AMD processors provided limited information, and I agree to some extent. I even mentioned this in the articles, however I believe I can defend it, for the most part. Part of my intent was to make the tests as scientifically valid as possible, and the way to do this was to make the minimum possible component change between runs to isolate various factors. The Athlon and Duron processors I have available are unlocked, which allowed me to test all of the FSB and multiplier changes while ensuring that there were no CPU stepping changes. The problem with using different CPUs for each speed grade and FSB is that different steppings can affect performance to some degree, and however small that is, it would still affect the results. The method I used ensured that the performance differences would be limited only to the specific timing that was altered. I would have loved to use a PIII and Celeron processor for these tests as well, however Intel was unable (or maybe unwilling) to provide me with an unlocked sample of these, and my search for one elsewhere was unsuccessful. In addition, the P4 is still fairly early in its lifecycle (as is the Athlon XP), so the variations available were somewhat limited.
Finally, several individuals stated that no single benchmark or benchmark suite will give the ‘true’ performance of a system or component, so reliance only on the Winstone or Sysmark 2001 benchmarks is not a good idea. I actually agree with this sentiment wholeheartedly, and believe that any reviewer who argues that an industry standard benchmark suite is biased because of the applications it contains, and uses that as a reason not to use that suite, is actually being a hypocrite. By not allowing the user to see the results of such a benchmark, the reviewer is actually trying to choose which applications are important rather than letting the reader decide. On the other hand, those who claim that they are letting the user make the decision, but don’t provide any information on what the benchmark actually does, are being somewhat high-handed. Most of those reading product reviews really don’t understand how benchmarks might differ, and will assume that they all measure ‘real world’ performance accurately, because they are popular or are an industry standard. Simply showing charts and graphs with results from various randomly chosen benchmarks does little to help the user understand, and most of those who claim they are allowing the user to decide generously sprinkle their reviews with commentary and conclusions, which would seem to be contradictory at the least. The idea behind these benchmark evaluations is not to try and convince reviewers (or anyone else) to use or avoid the tools, but to provide some insight into how they work so the results can be properly interpreted if they are used.
Discuss (15 comments)