Some comments on benchmarks

By: Paul A. Clayton (, October 14, 2020 11:11 am
Room: Moderated Discussions
hobold ( on October 10, 2020 12:33 pm wrote:
> David Kanter ( on October 10, 2020 10:36 am wrote:
> > Cinebench is a total rubbish benchmark!
> It happens to correlate very well with one or two workloads I care about. So for me it's
> been a useful benchmark, giving me a good idea of the performance I can expect.
> But like any performance metric (and like any benchmark), Cinebench tries to do an
> impossible thing when it maps "computing speed" to one single real number. That loses
> all information about specific strengths and weaknesses of different machines.

> Projecting a vector of many real numbers down to one single real number is always
> an information loss.

Mathematically, I am not certain that is the case. I do not understand infinities and have very little exposure to information theory, but I know that for a fixed/known number of prime numbers one can compose those prime numbers (by multiplication) into a single integer without loss of information (i.e., entry count and all primes provide a compression function). There was a (mediocre) science fiction story that used n = (prime_A)xA * (prime_B)xB * ... (prime_Ω)xΩ to encode a vast amount of information into a single integer. Since individual measurements have limited precision and range, concatenation of digits would convert N M-digit values into one M*N-digit value.

Yes, that is not what you meant. You meant something more like one single number that can be trivially used to accurately estimate performance for a workload based on a benchmark result.

Even with a well-suited, well-designed benchmark suite, using benchmark results to estimate performance for a workload is not trivial even if it might be straightforward (i.e., an algorithm exists). A formula like performance = (k1 * result1C1) * (k2 * result2C2) ... is straightforward and might provide accurate estimates across a broad range of systems, but it is not trivial (and requires generating the constants for the workload whose performance is being estimated).

> (And one could argue that the "true" performance profile
> isn't fully captured by a vector of microbenchmarks in the first place.)

Microbenchmarks are fragile because they do not account for interaction among hardware components. Larger, more complex subbenchmarks can reduce the prominence of such weaknesses, though such would be based on common interactions. A change in branch predictor accuracy could (unexpectedly) change the impact of L2 cache latency. A change in on-chip network topology and cache policies might accidentally improve performance of a subbenchmark that ordinarily highly correlates with the workload of interest (e.g., the workload of interest is sensitive to communication latency and the latency happens to go down for the subbenchmark because the communication pattern matches the hardware).

> Of course there can be many other ways in which a benchmark is bad.

I think one of the most common (valid) complaints are glass jaws and workload specificity. Examples of glass jaws include: Matrix300's vulnerability to sufficient cache size or compiler blocking-based optimization, libquantum's friendliness to autoparallelization (and the run-time rules not isolating this factor), art's AoS/SoA weakness, and (JIT) compilers aggressively removing "dead" code (result never used, result is compile-time constant) or doing aggressive strength reductions (Dhrystone?). Workload specificity merely constrains the breadth of relevance (in application space and somewhat in time — the "good enough"/no-longer-primary-constraint effect, hardware acceleration for mature common computation, change in algorithms, methods, or goals, etc.).

(For profile-guided optimization, designing the training data set to be appropriately similar to the testing data set seems challenging.)

Even careful benchmark design can introduce unexpected glass jaws where the return-on-investment for a change is substantially greater than expected. Distinguishing between generally useful optimizations and benchmark specials is not easy.

(I thought of composing a long comment on benchmarks but as I starting working on such I realized an adequate writing would take a lot of thought, research, and editing. The project is still tempting, but I do not feel up to the challenge at the moment.)

> But from that point of
> view the statement quoted above does not contain enough information for further discussion.

I agree. I am disappointed that David Kanter made such an unhelpful post. (It was at least concise, which is rarely a virtue of my posts.☺)
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 3Blue2020/10/08 08:58 AM
  Zen 3Rayla2020/10/08 09:10 AM
  Zen 3Adrian2020/10/08 09:13 AM
    Does anyone know whether Zen 3 has AVX-512? (NT)Foo_2020/10/08 10:54 AM
      Does anyone know whether Zen 3 has AVX-512?Adrian2020/10/08 11:11 AM
  Zen 3 - Number of load/store units2020/10/08 09:21 AM
    Zen 3 - Number of load/store unitsRayla2020/10/08 09:28 AM
      Zen 3 - Number of load/store units2020/10/08 10:22 AM
        Zen 3 - Number of load/store unitsAdrian2020/10/08 10:53 AM
          Zen 3 - Number of load/store unitsTravis Downs2020/10/08 08:45 PM
          Zen 3 - CAD benchmarkPer Hesselgren2020/10/09 06:29 AM
            Zen 3 - CAD benchmarkAdrian2020/10/09 08:27 AM
        Zen 3 - Number of load/store unitsitsmydamnation2020/10/08 01:38 PM
          Zen 3 - Number of load/store unitsGroo2020/10/08 01:48 PM
            Zen 3 - Number of load/store unitsWilco2020/10/08 02:02 PM
              Zen 3 - Number of load/store unitsDummond D. Slow2020/10/08 03:39 PM
                Zen 3 - Number of load/store unitsDoug S2020/10/09 07:11 AM
                  Zen 3 - Number of load/store unitsDummond D. Slow2020/10/09 08:43 AM
                    Zen 3 - Number of load/store unitsDoug S2020/10/09 12:43 PM
                      N7 and N7P are not load/Store units - please fix the topic in your replies (NT)Heikki Kultala2020/10/10 06:37 AM
  Zen 3Jeff S.2020/10/08 11:16 AM
    Zen 3anon2020/10/08 12:57 PM
    Disappointing opening line in paperPaul A. Clayton2020/10/11 05:16 AM
      Thoughts on "Improving the Utilization of µop Caches..."Paul A. Clayton2020/10/14 11:11 AM
        Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 10:56 AM
          Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 10:57 AM
            Sorry about the messanon2020/10/15 10:58 AM
              Sorry about the messBrett2020/10/16 02:22 AM
          Caching dependence info in µop cachePaul A. Clayton2020/10/16 05:20 AM
            Caching dependence info in µop cacheanon2020/10/16 11:36 AM
              Caching dependence info in µop cachePaul A. Clayton2020/10/18 12:28 PM
  Zen 3juanrga2020/10/09 09:12 AM
  Zen 3Mr. Camel2020/10/09 05:30 PM
    Zen 3anon.12020/10/09 11:44 PM
      Cinebench is terrible benchmarkDavid Kanter2020/10/10 09:36 AM
        Cinebench is terrible benchmarkanon.12020/10/10 11:06 AM
        Cinebench is terrible benchmarkhobold2020/10/10 11:33 AM
          Some comments on benchmarksPaul A. Clayton2020/10/14 11:11 AM
            Some comments on benchmarksMark Roulo2020/10/14 02:21 PM
    Zen 3Adrian2020/10/10 12:59 AM
      Zen 3Adrian2020/10/10 01:18 AM
        Zen 3majord2020/10/15 03:02 AM
  Zen 3hobold2020/10/10 07:58 AM
    Zen 3Maynard Handley2020/10/10 09:36 AM
      Zen 3hobold2020/10/10 11:19 AM
        Zen 3anon2020/10/11 01:58 AM
          Zen 3hobold2020/10/11 11:32 AM
            Zen 3anon2020/10/11 12:07 PM
              Zen 3hobold2020/10/11 01:22 PM
    Zen 3anon2020/10/10 10:51 AM
    Zen 3Michael S2020/10/11 12:16 AM
      Zen 3hobold2020/10/11 01:13 AM
        Zen 3Michael S2020/10/11 01:18 AM
      Zen 3anon.12020/10/11 11:17 AM
  Zen 3David Hess2020/10/12 05:43 AM
    more power? (NT)anonymous22020/10/12 12:26 PM
      I think he's comparing 65W 3700X vs 105W 5800X (NT)John H2020/10/12 03:33 PM
        ?! Those are apples and oranges! (NT)anon2020/10/12 03:49 PM
Reply to this Topic
Body: No Text
How do you spell avocado?