By: Adrian (a.delete@this.acm.org), November 18, 2020 11:26 am
Room: Moderated Discussions
Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 9:32 am wrote:
> Adrian (a.delete@this.acm.org) on November 18, 2020 9:04 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 8:41 am wrote:
> > > Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > > > >
> > > > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > > > it. I doubt that it is very expceptional and no other software does that.
> > > > >
> > > > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> > > >
> > > >
> > > >
> > > > A few of the workloads in which I am interested have long dependency chains,
> > > > so there are people for which such benchmarks are interesting.
> > > >
> > > >
> > > > This fact from Andrei, that Cinebench contains long dependency chains,
> > > > explains very well why M1 is less good in it than in other benchmarks.
> > > >
> > > >
> > > > Among 2 CPUs having the same average performance, a benchmark that contains
> > > > long dependency chains will favor the one having a low IPC and a high clock
> > > > frequency, against the one having a high IPC and a low clock frequency.
> > > >
> > >
> > > If you look at the single-thread score, M1 has a very high performance in it too, close
> > > to 4.7 GHz Tiger Lake. So its relative standing is not much different from where it ends
> > > up in Geekbench. It is the the regular bench-to-bench variablity, say -+5 %.
> > > Which means this is not the most important factor, as having high IPC and low clock hardly harms M1
> > > in single thread. It is the inability to exploit unused execution resources by SMT, really, which separates
> > > it from the x86 impementations in question when we look at the actual all-thread performance.
> >
> >
> > Any CPU with high IPC has the high IPC only on average.
> >
> > Any long dependency chain will have an execution time equal with the sum of the instruction
> > latencies, which will be larger for the CPU with low clock frequency, i.e. for M1.
> >
> > So normally in the benchmark with the long dependency chains
> > the IPC will be lower than in other applications,
> > unless the benchmark with the long dependency chains also
> > contains many additional operations interleaved with
> > those belonging to the long dependency chains, to fill the otherwise sub-utilized execution units.
> >
> > So, even if there are exceptions, normally the CPU having the lower clock frequency will not
> > achieve its typically higher IPC in the benchmark with many long dependency chains. While
> > the CPU with the high clock frequency is also affected, the effect is less, due to its lower
> > instruction latencies, which lead to shorter times spent in the dependency chains.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> My point was: the "long dependency chain" situation didn't make the high-IPC low-clock CPU say 50% slower
> than Zen 3. It was dunno, 10% slower in single thread, while it is say 5% faster in Geekbench (making the
> numbers up now, you can look at real values). SPEC also say they are roughly on par, up here, down here.
>
> I'm saying, M1's single thread score shows this hypothetiical effect of long dependency
> chains doesn't harm it by more than say 10 % (unless you want to argue that without it,
> M1 would be suddenly 20% faster than Zen 3 even if it is not like that elsewhere).
>
> SMT gives much bigger boost (30%+) than this on the Zen* cores, so I
> hope you see why I say it is more important of a factor in my opinion.
OK, I agree.
> Adrian (a.delete@this.acm.org) on November 18, 2020 9:04 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 8:41 am wrote:
> > > Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > > > >
> > > > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > > > it. I doubt that it is very expceptional and no other software does that.
> > > > >
> > > > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> > > >
> > > >
> > > >
> > > > A few of the workloads in which I am interested have long dependency chains,
> > > > so there are people for which such benchmarks are interesting.
> > > >
> > > >
> > > > This fact from Andrei, that Cinebench contains long dependency chains,
> > > > explains very well why M1 is less good in it than in other benchmarks.
> > > >
> > > >
> > > > Among 2 CPUs having the same average performance, a benchmark that contains
> > > > long dependency chains will favor the one having a low IPC and a high clock
> > > > frequency, against the one having a high IPC and a low clock frequency.
> > > >
> > >
> > > If you look at the single-thread score, M1 has a very high performance in it too, close
> > > to 4.7 GHz Tiger Lake. So its relative standing is not much different from where it ends
> > > up in Geekbench. It is the the regular bench-to-bench variablity, say -+5 %.
> > > Which means this is not the most important factor, as having high IPC and low clock hardly harms M1
> > > in single thread. It is the inability to exploit unused execution resources by SMT, really, which separates
> > > it from the x86 impementations in question when we look at the actual all-thread performance.
> >
> >
> > Any CPU with high IPC has the high IPC only on average.
> >
> > Any long dependency chain will have an execution time equal with the sum of the instruction
> > latencies, which will be larger for the CPU with low clock frequency, i.e. for M1.
> >
> > So normally in the benchmark with the long dependency chains
> > the IPC will be lower than in other applications,
> > unless the benchmark with the long dependency chains also
> > contains many additional operations interleaved with
> > those belonging to the long dependency chains, to fill the otherwise sub-utilized execution units.
> >
> > So, even if there are exceptions, normally the CPU having the lower clock frequency will not
> > achieve its typically higher IPC in the benchmark with many long dependency chains. While
> > the CPU with the high clock frequency is also affected, the effect is less, due to its lower
> > instruction latencies, which lead to shorter times spent in the dependency chains.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> My point was: the "long dependency chain" situation didn't make the high-IPC low-clock CPU say 50% slower
> than Zen 3. It was dunno, 10% slower in single thread, while it is say 5% faster in Geekbench (making the
> numbers up now, you can look at real values). SPEC also say they are roughly on par, up here, down here.
>
> I'm saying, M1's single thread score shows this hypothetiical effect of long dependency
> chains doesn't harm it by more than say 10 % (unless you want to argue that without it,
> M1 would be suddenly 20% faster than Zen 3 even if it is not like that elsewhere).
>
> SMT gives much bigger boost (30%+) than this on the Zen* cores, so I
> hope you see why I say it is more important of a factor in my opinion.
OK, I agree.