By: Adrian (a.delete@this.acm.org), November 18, 2020 10:04 am
Room: Moderated Discussions
Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 8:41 am wrote:
> Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > >
> > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > it. I doubt that it is very expceptional and no other software does that.
> > >
> > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> >
> >
> >
> > A few of the workloads in which I am interested have long dependency chains,
> > so there are people for which such benchmarks are interesting.
> >
> >
> > This fact from Andrei, that Cinebench contains long dependency chains,
> > explains very well why M1 is less good in it than in other benchmarks.
> >
> >
> > Among 2 CPUs having the same average performance, a benchmark that contains
> > long dependency chains will favor the one having a low IPC and a high clock
> > frequency, against the one having a high IPC and a low clock frequency.
> >
>
> If you look at the single-thread score, M1 has a very high performance in it too, close
> to 4.7 GHz Tiger Lake. So its relative standing is not much different from where it ends
> up in Geekbench. It is the the regular bench-to-bench variablity, say -+5 %.
> Which means this is not the most important factor, as having high IPC and low clock hardly harms M1
> in single thread. It is the inability to exploit unused execution resources by SMT, really, which separates
> it from the x86 impementations in question when we look at the actual all-thread performance.
Any CPU with high IPC has the high IPC only on average.
Any long dependency chain will have an execution time equal with the sum of the instruction latencies, which will be larger for the CPU with low clock frequency, i.e. for M1.
So normally in the benchmark with the long dependency chains the IPC will be lower than in other applications, unless the benchmark with the long dependency chains also contains many additional operations interleaved with those belonging to the long dependency chains, to fill the otherwise sub-utilized execution units.
So, even if there are exceptions, normally the CPU having the lower clock frequency will not achieve its typically higher IPC in the benchmark with many long dependency chains. While the CPU with the high clock frequency is also affected, the effect is less, due to its lower instruction latencies, which lead to shorter times spent in the dependency chains.
> Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > >
> > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > it. I doubt that it is very expceptional and no other software does that.
> > >
> > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> >
> >
> >
> > A few of the workloads in which I am interested have long dependency chains,
> > so there are people for which such benchmarks are interesting.
> >
> >
> > This fact from Andrei, that Cinebench contains long dependency chains,
> > explains very well why M1 is less good in it than in other benchmarks.
> >
> >
> > Among 2 CPUs having the same average performance, a benchmark that contains
> > long dependency chains will favor the one having a low IPC and a high clock
> > frequency, against the one having a high IPC and a low clock frequency.
> >
>
> If you look at the single-thread score, M1 has a very high performance in it too, close
> to 4.7 GHz Tiger Lake. So its relative standing is not much different from where it ends
> up in Geekbench. It is the the regular bench-to-bench variablity, say -+5 %.
> Which means this is not the most important factor, as having high IPC and low clock hardly harms M1
> in single thread. It is the inability to exploit unused execution resources by SMT, really, which separates
> it from the x86 impementations in question when we look at the actual all-thread performance.
Any CPU with high IPC has the high IPC only on average.
Any long dependency chain will have an execution time equal with the sum of the instruction latencies, which will be larger for the CPU with low clock frequency, i.e. for M1.
So normally in the benchmark with the long dependency chains the IPC will be lower than in other applications, unless the benchmark with the long dependency chains also contains many additional operations interleaved with those belonging to the long dependency chains, to fill the otherwise sub-utilized execution units.
So, even if there are exceptions, normally the CPU having the lower clock frequency will not achieve its typically higher IPC in the benchmark with many long dependency chains. While the CPU with the high clock frequency is also affected, the effect is less, due to its lower instruction latencies, which lead to shorter times spent in the dependency chains.