Readers Sound Off
After publishing my evaluation of the SYSmark 2001 benchmark, I received a fair amount of feedback from readers. Though most of the comments were not supportive of either the benchmark or methodology, there were several that were, and they brought up some good points, as well as pointed out a flaw in my evaluation. After corresponding with several readers, I must admit that my perception of the ‘think time’ may be incorrect, and if so I will have to modify my original conclusions. Also, I mentioned that by timing only the response time for individual actions it shouldn’t be difficult to give the results for every application as they used to in SYSmark 2000. What I had failed to consider was that in both scenarios (Office Productivity and Internet Content Creation), there are applications running in the background that will affect the overall time of each action. Differences in how these tasks are dispatched can cause wide variations between runs, so the methodology used would prevent individual application scores from being measured reliably.
I would also like to emphasize, however, that the lack of individual application scores is one of the biggest complaints against both Winstone and SYSmark. This is the reason many publications have been focusing more and more on benchmarking with applications they believe are representative of whatever audience they are targeting. Both eTesting Labs and BAPCo should recognize this as a limitation of their benchmarks. It seems that a better implementation would be to provide the option of running each application individually, running them as part of a multitasking scenario, or both. This would be much more useful for the component level reviews that are so popular today.
To provide a frame of reference for the rest of this article, I’ll try to give some insight as to how I view the issue of performance on an interactive system, such as a PC. At the system level, either there is a user task to run (a job, task or activity) or there isn’t (idle time). We can view all idle time as waiting for user input (which some call ‘think time’). Granted, there will be system activity during this idle time, even if it is just the dispatcher looking for something to do, but this will not be ‘seen’ by the user. From the user’s point of view, things either happen instantaneously, or he/she must wait for a period of time. It is the length of this waiting period that is perceived as the performance of the system.
I break down performance into four different areas: System startup, application startup, task switching and application response time. System startup is a relatively rare event (at least for a stable system), and will be a consideration only for a small subset of users. I would suggest that system startup is really only an issue when the system has to be booted more than once per day, and many systems today run for days without restarting. Application startup and task switching time may or may not be an issue, depending upon the user and the situation. Some users will start an application once and leave it idle in the background when not using it, while others will cycle their applications throughout the work day. I tend to do both, depending upon the application and where I am using it. Application response time is what the user perceives while in an active application and an action is requested.
Discuss (15 comments)