By: Dean Kent (dkent.delete@this.realworldtech.com), December 5, 2005 12:35 pm
Room: Moderated Discussions
Leonov (chrissinger@vigin.net) on 12/5/05 wrote:
---------------------------
>
>As far as I am concerned it is never wrong to disclose information when benchmarking
>a system and that (as far as I can see) is all that is being asked for.
Let's step back for a moment and look at the bigger picture.
For those who haven't ever actually tried, the problems involved with benchmarking/reviewing are tremendously difficult to solve. I made a contention a number of years ago that the attempts by publications to 'benchmark' processors/systems is a (next to) impossible task. The reason is that a true benchmark is one that represents *your* usage, not someone else's. The best you can for review purposes is to do a reasonable approximation.
First, you generally get notified of a new system/component a few months in advance of its release. You get your hands on it a few weeks before release. Time is your enemy, if you want to have a relevant and timely review.
Basically you have two choices for benchmarking: Standardized benchmarks (SPEC, TPC, etc.) and application benchmarks (Cinebench, Maya, Fluent, Cadalyst, Pro/Engineer, etc.) Yeah, you can use synthetic benchmarks to test various things but the problem is relating them to actual performance in the 'real world'.
The benefit of the standardized benchmarks is that some group of (supposedly) knowledgable people spent a lot of time and effort to profile hundreds of applications and pick the most representative (along with representative data sets). However, the drawback is that these are generally not complete applications, but 'representative samples'. In addition, these groups have to be funded and staffed, and that usually means by the manufacturer's who have the most to gain by this effort. Therefore, we have suspicions of how these efforts have been influenced.
The benefit of application benchmarks is that you have some ability to run what users are actually running. The drawbacks are many, however. You have to identify data sets that are realistic and representative, you have to become familiar with the software and create an automated script, you have to research the applicability of them for the system/component being benchmarked (is the app typically used in that market segment?)
Then there is the cost. You have to buy a license for the standardized benchmarks, or you have to buy a license for the applications. Sometimes (as with some SPEC benchmarks) you have to buy the license for both. Sure, you can get open source applications, but you still have to do the research to find out how suitable they are (how popular, how representative, how well written, etc.)
Then, after you have selected the benchmarks you may have to compile them. Which compiler? Intel's, or Pathscale, which has a relationship with AMD? Maybe Microsoft - which means another license to pay for... or use Linux/gcc.. but is it really representative? For some, yes. For others, no. Then you have to run them - more than once if you are being thorough. In many cases this is hours of runtime. You might have to profile them (with VTune), gather the results and analyze them. This isn't so hard with a dozen benchmarks, but try it with several dozen...
Finally, you have to figure out whether two benchmarks are really testing the same thing. In other words, benchmarking is really all about finding out the bottlenecks in a system/component. Two applications that use essentially the same resources will have the same bottleneck - so running both of them is somewhat redundant. What you ideally want to do is identify benchmarks that use the system resources sufficiently differently that you can actually identify the weak/strong parts of the system.
No matter how well you choose, setup and analyze, however, there is *always* someone complaining that one or more benchmarks used is somehow inappropriate for various reasons. When enough complaints have been filed, virtually all benchmarks have been disqualified for one reason or another.
The 'solution' to this problem has typically been 'open source', because anyone can download/research/compile them and verify what they consist of. Unfortunately, the vast majority of people (including reviewers) can't or won't do this - so the 'benefit' isn't really much of a benefit. My contention is that if you have the expertise and are willing to take the effort to identify what is in an open source benchmark, you probably have the ability to identify what a 'closed source' benchmark is doing via profiling/disassembling/etc. - if you *really* want to know.
In addition, it really doesn't solve the problem for those who *do* use 'closed source' applications and want to know how those will perform - even if they are optimized.
So what does a reviewer do? He/she wrings his/her hands, begs for licenses/data sets and takes whatever assistance he/she can get from the manufacturer. Reviewing an AMD part? - try getting Intel to provide you with assistance for obtaining realistic benchmarks, and vice-versa. So where do you *think* you will get assistance? They may provide you with some benchmarks, which you *know* will not include anything that doesn't make them look good - but you hope that these will become part of a suite of benchmarks you can use on other systems. But you may not have the ability to test the application on those other systems before doing your current review, so you don't know for sure how applicable it is. Then you get questioned about it in a way that might make you look bad.
So, maybe you just look at what everyone else is doing and use what they use hoping that they did their due diligence. Or maybe you take suggestions from regular readers, hoping that they have a clue. Or maybe you spin your wheels doing so much research that you don't even get a review out... jeopardizing your chances of getting another part to review. Or maybe you just take the manufacturer's recommendations because that is what you have time for.
In short - reviews suck. :-) But there isn't much alternative unless you are independently wealthy (and then you probably don't care)
Were I to have the time/money/ability, I would do it this way...
Spend a lot of time, effort and money to understand the system, components and tools you are using. Run all of the standardized, synthetic and application benchmarks possible on various systems (vary memory, hard drives, graphics, and other components as much as possible). Run software analyzers on each setup. Create a database of results that is sortable by many different criteria. Spend the time to analyze the results and try to group applications and benchmarks by their profiles (which have a large memory footprint, which are CPU bound, which are I/O bound, which are primarily integer vs floating point, etc., etc). In this way, when a benchmark is run, you could go to the database and find out what other applications/benchmarks have a similar resource usage/profile and use that to estimate how the benchmark applies to *your* situation, if at all. You could see the outliers, and weed them out of reviews. You could make realistic comparisons of which systems/components are comparable in performance. You could fairly quickly see the strengths/weaknesses of various setups.
Today, reviews are only marginally useful to the consumer (though extremely useful to the manufacturer), and so discussing whether one benchmark is 'skewed' or whether a manufacturer influences the results (which all will given the right opportunity) is pretty useless, IMO.
I stopped doing reviews awhile back because of the frustration about what I *couldn't* do. I started the Benchmark Examiner articles to try and focus on some of the problems I mentioned (but life got in the way - big time). I'd like to do some reviews again, and I very well might - but I'll probably be frustrated, dissatisfied and stressed about them much more than anyone reading them will be. I'd also like to do benchmark evaluations, create a benchmark data base and several other related tasks... but time is my enemy, of course.
Wouldn't it be great if the open source concept could be applied here - farm out a task, and then put the results into the data base? Have at least three people run the same tests, and if there are variations, try it again until you get consistent results... Then time and money wouldn't be such a problem... ;-)
Regards,
Dean
>
>L
---------------------------
>
>As far as I am concerned it is never wrong to disclose information when benchmarking
>a system and that (as far as I can see) is all that is being asked for.
Let's step back for a moment and look at the bigger picture.
For those who haven't ever actually tried, the problems involved with benchmarking/reviewing are tremendously difficult to solve. I made a contention a number of years ago that the attempts by publications to 'benchmark' processors/systems is a (next to) impossible task. The reason is that a true benchmark is one that represents *your* usage, not someone else's. The best you can for review purposes is to do a reasonable approximation.
First, you generally get notified of a new system/component a few months in advance of its release. You get your hands on it a few weeks before release. Time is your enemy, if you want to have a relevant and timely review.
Basically you have two choices for benchmarking: Standardized benchmarks (SPEC, TPC, etc.) and application benchmarks (Cinebench, Maya, Fluent, Cadalyst, Pro/Engineer, etc.) Yeah, you can use synthetic benchmarks to test various things but the problem is relating them to actual performance in the 'real world'.
The benefit of the standardized benchmarks is that some group of (supposedly) knowledgable people spent a lot of time and effort to profile hundreds of applications and pick the most representative (along with representative data sets). However, the drawback is that these are generally not complete applications, but 'representative samples'. In addition, these groups have to be funded and staffed, and that usually means by the manufacturer's who have the most to gain by this effort. Therefore, we have suspicions of how these efforts have been influenced.
The benefit of application benchmarks is that you have some ability to run what users are actually running. The drawbacks are many, however. You have to identify data sets that are realistic and representative, you have to become familiar with the software and create an automated script, you have to research the applicability of them for the system/component being benchmarked (is the app typically used in that market segment?)
Then there is the cost. You have to buy a license for the standardized benchmarks, or you have to buy a license for the applications. Sometimes (as with some SPEC benchmarks) you have to buy the license for both. Sure, you can get open source applications, but you still have to do the research to find out how suitable they are (how popular, how representative, how well written, etc.)
Then, after you have selected the benchmarks you may have to compile them. Which compiler? Intel's, or Pathscale, which has a relationship with AMD? Maybe Microsoft - which means another license to pay for... or use Linux/gcc.. but is it really representative? For some, yes. For others, no. Then you have to run them - more than once if you are being thorough. In many cases this is hours of runtime. You might have to profile them (with VTune), gather the results and analyze them. This isn't so hard with a dozen benchmarks, but try it with several dozen...
Finally, you have to figure out whether two benchmarks are really testing the same thing. In other words, benchmarking is really all about finding out the bottlenecks in a system/component. Two applications that use essentially the same resources will have the same bottleneck - so running both of them is somewhat redundant. What you ideally want to do is identify benchmarks that use the system resources sufficiently differently that you can actually identify the weak/strong parts of the system.
No matter how well you choose, setup and analyze, however, there is *always* someone complaining that one or more benchmarks used is somehow inappropriate for various reasons. When enough complaints have been filed, virtually all benchmarks have been disqualified for one reason or another.
The 'solution' to this problem has typically been 'open source', because anyone can download/research/compile them and verify what they consist of. Unfortunately, the vast majority of people (including reviewers) can't or won't do this - so the 'benefit' isn't really much of a benefit. My contention is that if you have the expertise and are willing to take the effort to identify what is in an open source benchmark, you probably have the ability to identify what a 'closed source' benchmark is doing via profiling/disassembling/etc. - if you *really* want to know.
In addition, it really doesn't solve the problem for those who *do* use 'closed source' applications and want to know how those will perform - even if they are optimized.
So what does a reviewer do? He/she wrings his/her hands, begs for licenses/data sets and takes whatever assistance he/she can get from the manufacturer. Reviewing an AMD part? - try getting Intel to provide you with assistance for obtaining realistic benchmarks, and vice-versa. So where do you *think* you will get assistance? They may provide you with some benchmarks, which you *know* will not include anything that doesn't make them look good - but you hope that these will become part of a suite of benchmarks you can use on other systems. But you may not have the ability to test the application on those other systems before doing your current review, so you don't know for sure how applicable it is. Then you get questioned about it in a way that might make you look bad.
So, maybe you just look at what everyone else is doing and use what they use hoping that they did their due diligence. Or maybe you take suggestions from regular readers, hoping that they have a clue. Or maybe you spin your wheels doing so much research that you don't even get a review out... jeopardizing your chances of getting another part to review. Or maybe you just take the manufacturer's recommendations because that is what you have time for.
In short - reviews suck. :-) But there isn't much alternative unless you are independently wealthy (and then you probably don't care)
Were I to have the time/money/ability, I would do it this way...
Spend a lot of time, effort and money to understand the system, components and tools you are using. Run all of the standardized, synthetic and application benchmarks possible on various systems (vary memory, hard drives, graphics, and other components as much as possible). Run software analyzers on each setup. Create a database of results that is sortable by many different criteria. Spend the time to analyze the results and try to group applications and benchmarks by their profiles (which have a large memory footprint, which are CPU bound, which are I/O bound, which are primarily integer vs floating point, etc., etc). In this way, when a benchmark is run, you could go to the database and find out what other applications/benchmarks have a similar resource usage/profile and use that to estimate how the benchmark applies to *your* situation, if at all. You could see the outliers, and weed them out of reviews. You could make realistic comparisons of which systems/components are comparable in performance. You could fairly quickly see the strengths/weaknesses of various setups.
Today, reviews are only marginally useful to the consumer (though extremely useful to the manufacturer), and so discussing whether one benchmark is 'skewed' or whether a manufacturer influences the results (which all will given the right opportunity) is pretty useless, IMO.
I stopped doing reviews awhile back because of the frustration about what I *couldn't* do. I started the Benchmark Examiner articles to try and focus on some of the problems I mentioned (but life got in the way - big time). I'd like to do some reviews again, and I very well might - but I'll probably be frustrated, dissatisfied and stressed about them much more than anyone reading them will be. I'd also like to do benchmark evaluations, create a benchmark data base and several other related tasks... but time is my enemy, of course.
Wouldn't it be great if the open source concept could be applied here - farm out a task, and then put the results into the data base? Have at least three people run the same tests, and if there are variations, try it again until you get consistent results... Then time and money wouldn't be such a problem... ;-)
Regards,
Dean
>
>L
Topic | Posted By | Date |
---|---|---|
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/29 01:45 AM |
Bensley Platform Preview (Part II) Online | Temp | 2005/11/29 06:25 AM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/29 11:55 AM |
Bensley Platform Preview (Part II) Online | Temp | 2005/11/29 02:29 PM |
Bensley Platform Preview (Part II) Online | rwessel | 2005/11/29 02:53 PM |
Bensley Platform Preview (Part II) Online | Dean Kent | 2005/11/29 12:01 PM |
Bensley Platform Preview (Part II) Online | William Campbell | 2005/11/29 12:48 PM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/29 01:37 PM |
Well said! (NT) | savantu | 2005/11/29 01:44 PM |
Peer review | William Campbell | 2005/11/29 04:12 PM |
To clarify intent | William Campbell | 2005/11/29 04:19 PM |
Peer review | David Kanter | 2005/11/29 04:21 PM |
Peer review | William Campbell | 2005/11/29 06:13 PM |
Peer review | nick | 2005/11/29 11:09 PM |
Peer review | William Campbell | 2005/11/30 12:39 AM |
Peer review | David Kanter | 2005/11/30 01:21 AM |
Peer review | David Kanter | 2005/11/29 11:25 PM |
Yes please (NT) | William Campbell | 2005/11/30 12:28 AM |
Yes please (NT) | David Kanter | 2005/11/30 06:19 PM |
Thank you | William Campbell | 2005/11/30 08:51 PM |
Thank you | David Kanter | 2005/11/30 10:29 PM |
Peer review | Dean Kent | 2005/11/29 07:12 PM |
Peer review | William Campbell | 2005/11/29 07:50 PM |
Peer review | Dean Kent | 2005/11/30 05:16 AM |
Peer review | William Campbell | 2005/11/30 08:49 PM |
Peer review | Temp | 2005/12/01 03:02 AM |
Peer review | William Campbell | 2005/12/01 04:54 AM |
Peer review | Temp | 2005/12/01 05:11 AM |
Peer review | Temp | 2005/12/01 03:03 AM |
Peer review | Dean Kent | 2005/12/01 07:55 AM |
Peer review | Bill Todd | 2005/12/01 08:26 PM |
Peer review | David Kanter | 2005/12/01 09:52 PM |
Peer review | Bill Todd | 2005/12/01 10:14 PM |
Peer review | David Kanter | 2005/12/01 11:04 PM |
Peer review | Bill Todd | 2005/12/02 12:13 AM |
Peer review | Dean Kent | 2005/12/02 07:02 AM |
You lost this one. | Ray | 2005/12/02 11:54 AM |
You lost. | tecate | 2005/12/02 02:55 PM |
I second that (NT) | savantu | 2005/12/02 03:22 PM |
I wasn't in the game. | Ray | 2005/12/02 04:19 PM |
I wasn't in the game. | Dean Kent | 2005/12/02 10:20 PM |
You lost. | Bill Todd | 2005/12/02 05:28 PM |
You lost. | Anonymous | 2005/12/02 08:27 PM |
You lost. | Bill Todd | 2005/12/02 08:56 PM |
You lost. | Dean Kent | 2005/12/02 10:37 PM |
You lost. | Bill Todd | 2005/12/03 12:08 AM |
All about the context | David Kanter | 2005/12/03 02:27 PM |
All about the context | Bill Todd | 2005/12/03 02:51 PM |
All about the context | David Kanter | 2005/12/03 04:29 PM |
You lost. | Ray | 2005/12/02 09:15 PM |
You lost. | Bill Todd | 2005/12/02 10:00 PM |
You lost. | Ray | 2005/12/02 11:09 PM |
You lost. | anonymous | 2005/12/03 02:42 AM |
You lost. | Bill Todd | 2005/12/03 02:45 PM |
Well... | David Kanter | 2005/12/03 03:51 PM |
You lost. | Ray | 2005/12/03 05:54 PM |
Bill is a self loathing American | NIKOLAS | 2005/12/03 06:25 PM |
Bill is a self loathing American | Bill Todd | 2005/12/03 09:40 PM |
Bill is a self loathing American | Bill Todd | 2005/12/03 09:48 PM |
Bill is a self loathing American | David Kanter | 2005/12/03 09:48 PM |
Bill is a self loathing American | Bill Todd | 2005/12/03 11:17 PM |
Bill is a self loathing American | David Kanter | 2005/12/04 12:37 AM |
Bill is a self loathing American | Bill Todd | 2005/12/04 01:19 AM |
This whole thread is a symptom... | Dean Kent | 2005/12/04 09:43 AM |
This whole thread is a symptom... | tecate | 2005/12/04 01:17 PM |
This whole thread is a symptom... | mas | 2005/12/04 02:02 PM |
This whole thread is a symptom... | tecate | 2005/12/05 06:21 AM |
This whole thread is a symptom... | tecate | 2005/12/04 01:18 PM |
... | Temp | 2005/12/04 03:38 PM |
... | Dean Kent | 2005/12/04 05:25 PM |
Once more, alas | Temp | 2005/12/05 02:23 AM |
Once more, alas | Dean Kent | 2005/12/05 08:23 AM |
Bye | Temp | 2005/12/05 10:47 AM |
Once more, alas | Bill Todd | 2005/12/05 10:58 AM |
Sungard as a benchmark | Temp | 2005/12/05 03:42 AM |
Sungard as a benchmark | Dean Kent | 2005/12/05 10:06 AM |
Sungard as a benchmark | David Kanter | 2005/12/05 08:08 PM |
Sungard as a benchmark | Temp | 2005/12/06 01:45 AM |
More info about Sungard | Temp | 2005/12/06 03:20 PM |
More info about Sungard | David Kanter | 2005/12/06 04:25 PM |
More info about Sungard | Temp | 2005/12/07 12:40 AM |
More info about Sungard | Dean Kent | 2005/12/07 07:52 AM |
More info about Sungard | Dean Kent | 2005/12/06 07:22 PM |
This whole thread is a symptom... | Bill Todd | 2005/12/04 09:31 PM |
This whole thread is a symptom... | Dean Kent | 2005/12/04 09:51 PM |
You lost. | Bill Todd | 2005/12/03 11:14 PM |
You lost. | Ray | 2005/12/04 01:06 AM |
You lost. | Bill Todd | 2005/12/04 01:54 AM |
Enough with the politics... (NT) | David Kanter | 2005/12/04 03:41 AM |
You lost. | anonymous | 2005/12/04 04:03 AM |
Well Said! (NT) | Anonymous | 2005/12/04 04:48 AM |
You lost. | savantu | 2005/12/04 06:47 AM |
You lost. | Bill Todd | 2005/12/04 09:39 PM |
You lost. | anonymous | 2005/12/05 02:51 AM |
You lost this one. | Dean Kent | 2005/12/02 09:41 PM |
You lost this one. | Leonov | 2005/12/03 12:55 AM |
You lost this one. | tecate | 2005/12/03 05:27 AM |
You lost this one. | Leonov | 2005/12/03 06:33 AM |
You lost this one. | savantu | 2005/12/03 10:19 AM |
You lost this one. | Leonov | 2005/12/03 12:19 PM |
For god sake. | Anonymous | 2005/12/04 04:28 AM |
It's sad | sav | 2005/12/04 06:43 AM |
It's sad | mas | 2005/12/04 07:09 AM |
It's sad | Michael S | 2005/12/04 07:33 AM |
Perfect | No one you'd know | 2005/12/04 10:52 AM |
Perfect | mas | 2005/12/04 12:32 PM |
Perfect | Dean Kent | 2005/12/04 12:50 PM |
Perfect | mas | 2005/12/04 01:16 PM |
Perfect | Dean Kent | 2005/12/04 04:22 PM |
Posts deleted, topic not open for discussion | David Kanter | 2005/12/05 02:05 PM |
Posts deleted, topic not open for discussion | Keith Fiske | 2005/12/05 05:03 PM |
This will not be tolerated | David Kanter | 2005/12/04 04:32 PM |
For god sake. | Leonov | 2005/12/05 07:10 AM |
Back on track... | Dean Kent | 2005/12/05 12:35 PM |
Back on track... | Leonov | 2005/12/06 03:08 AM |
You lost this one. | Temp | 2005/12/03 04:16 AM |
Peer review | Dean Kent | 2005/12/02 06:22 AM |
Peer review | Temp | 2005/12/02 12:01 PM |
Bensley Platform Preview (Part II) Online | an | 2005/11/29 01:17 PM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/29 02:17 PM |
Bensley Platform Preview (Part II) Online | an | 2005/11/30 07:52 AM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/30 10:42 PM |
Bensley Platform Preview (Part II) Online | Dean Kent | 2005/11/29 04:11 PM |
Bensley Platform Preview (Part II) Online | anonymous | 2005/11/29 05:38 PM |
It's called | William Campbell | 2005/11/29 06:17 PM |
Bensley Platform Preview (Part II) Online | Temp | 2005/11/29 02:41 PM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/29 03:02 PM |
Bensley Platform Preview (Part II) Online | Dean Kent | 2005/11/29 07:41 PM |
2 small nitpicks | an | 2005/11/29 02:03 PM |
2 small nitpicks | Daniel Bizó | 2005/11/29 03:27 PM |
2 small nitpicks | an | 2005/11/30 07:40 AM |
2 small nitpicks | Daniel Bizó | 2005/11/30 11:17 AM |
2 small nitpicks | an | 2005/11/30 12:30 PM |
2 small nitpicks | David Kanter | 2005/11/30 02:32 PM |
2 small nitpicks | an | 2005/11/30 02:49 PM |
Minor Comment about CineBench | Rakesh Malik | 2005/11/29 02:22 PM |
Bensley Platform Preview (Part II) Online | PiedPiper | 2005/11/29 08:04 PM |
Bensley Platform Preview (Part II) Online | PiedPiper | 2005/11/29 08:08 PM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/11/30 02:05 AM |
Bensley Platform Preview (Part II) Online | PiedPiper | 2005/11/30 07:58 PM |
Bensley Platform Preview (Part II) Online | David Kanter | 2005/12/01 01:45 AM |
Why no 64-bit tests? | PiedPiper | 2005/11/29 08:37 PM |
Why no 64-bit tests? | David Kanter | 2005/11/30 02:07 AM |