By: Maynard Handley (name99.delete@this.name99.org), November 18, 2020 10:22 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> >
> > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > it. I doubt that it is very expceptional and no other software does that.
> >
> > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
>
>
>
> A few of the workloads in which I am interested have long dependency chains,
> so there are people for which such benchmarks are interesting.
>
>
> This fact from Andrei, that Cinebench contains long dependency chains,
> explains very well why M1 is less good in it than in other benchmarks.
>
>
> Among 2 CPUs having the same average performance, a benchmark that contains
> long dependency chains will favor the one having a low IPC and a high clock
> frequency, against the one having a high IPC and a low clock frequency.
>
How long is a "long dependency chain"?
The role of the issue queue (what Intel calls "Scheduler Size") is buffer dependency chains and allow you to move execution beyond them, so this is not a trivial question.
If the dependency chains are, say, 30 instructions in length then an issue queue of ~100 entries can run three such chains in parallel.
If the dependency chains are, say, 100 instructions in length, and CPU A has an issue queue of ~100 while CPU B has an issue queue of ~300 then CPU B will be able to run instructions 3x as fast as CPU A all other things being considered.
Even apart from this question (which is a question about the programs) there is a question on the CPU side of how good the issue scheduler(s) are. In particular are the schedulers somehow tracking and informed of criticality?
Adding criticality to the scheduler is not easy (I don't think we have any confirmed cases in the literature) but if you do add that to your scheduler, you can be more certain of running those dependency chains as fast as theoretically possible, while delaying less important work till it can find free slots; whereas if you don't track criticality you may be wasting cycles by giving them to non-critical instructions that just happen to be older than the critical instructions in the chain.
> Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> >
> > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > it. I doubt that it is very expceptional and no other software does that.
> >
> > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
>
>
>
> A few of the workloads in which I am interested have long dependency chains,
> so there are people for which such benchmarks are interesting.
>
>
> This fact from Andrei, that Cinebench contains long dependency chains,
> explains very well why M1 is less good in it than in other benchmarks.
>
>
> Among 2 CPUs having the same average performance, a benchmark that contains
> long dependency chains will favor the one having a low IPC and a high clock
> frequency, against the one having a high IPC and a low clock frequency.
>
How long is a "long dependency chain"?
The role of the issue queue (what Intel calls "Scheduler Size") is buffer dependency chains and allow you to move execution beyond them, so this is not a trivial question.
If the dependency chains are, say, 30 instructions in length then an issue queue of ~100 entries can run three such chains in parallel.
If the dependency chains are, say, 100 instructions in length, and CPU A has an issue queue of ~100 while CPU B has an issue queue of ~300 then CPU B will be able to run instructions 3x as fast as CPU A all other things being considered.
Even apart from this question (which is a question about the programs) there is a question on the CPU side of how good the issue scheduler(s) are. In particular are the schedulers somehow tracking and informed of criticality?
Adding criticality to the scheduler is not easy (I don't think we have any confirmed cases in the literature) but if you do add that to your scheduler, you can be more certain of running those dependency chains as fast as theoretically possible, while delaying less important work till it can find free slots; whereas if you don't track criticality you may be wasting cycles by giving them to non-critical instructions that just happen to be older than the critical instructions in the chain.