By: Adrian (a.delete@this.acm.org), November 18, 2020 11:28 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 9:22 am wrote:
> Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > >
> > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > it. I doubt that it is very expceptional and no other software does that.
> > >
> > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> >
> >
> >
> > A few of the workloads in which I am interested have long dependency chains,
> > so there are people for which such benchmarks are interesting.
> >
> >
> > This fact from Andrei, that Cinebench contains long dependency chains,
> > explains very well why M1 is less good in it than in other benchmarks.
> >
> >
> > Among 2 CPUs having the same average performance, a benchmark that contains
> > long dependency chains will favor the one having a low IPC and a high clock
> > frequency, against the one having a high IPC and a low clock frequency.
> >
>
> How long is a "long dependency chain"?
>
> The role of the issue queue (what Intel calls "Scheduler Size") is buffer dependency chains
> and allow you to move execution beyond them, so this is not a trivial question.
> If the dependency chains are, say, 30 instructions in length then an
> issue queue of ~100 entries can run three such chains in parallel.
> If the dependency chains are, say, 100 instructions in length, and CPU A has an
> issue queue of ~100 while CPU B has an issue queue of ~300 then CPU B will be able
> to run instructions 3x as fast as CPU A all other things being considered.
>
> Even apart from this question (which is a question about the programs) there
> is a question on the CPU side of how good the issue scheduler(s) are. In particular
> are the schedulers somehow tracking and informed of criticality?
> Adding criticality to the scheduler is not easy (I don't think we have any confirmed cases in the literature)
> but if you do add that to your scheduler, you can be more certain of running those dependency chains
> as fast as theoretically possible, while delaying less important work till it can find free slots;
> whereas if you don't track criticality you may be wasting cycles by giving them to non-critical instructions
> that just happen to be older than the critical instructions in the chain.
>
You are right, but unfortunately we can only speculate about these, as we do not have enough quantitative information.
> Adrian (a.delete@this.acm.org) on November 18, 2020 8:15 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:48 am wrote:
> > >
> > > Interesting but "long dependency chains"don't disqualify it as a benchmark, does
> > > it. I doubt that it is very expceptional and no other software does that.
> > >
> > > I hope eventually we get some very comparable HT/noHT CPU in AnandTech benchmark for comparing,
> > > in the meantime I found this https://www.anandtech.com/bench/product/2622?vs=2652
> > > Skylake @ 3,8-4,2GHz 4c/4t versus 4,0-4,2 4c/8t (sadly also has more L2 cache due to Intel's segmenting).
> >
> >
> >
> > A few of the workloads in which I am interested have long dependency chains,
> > so there are people for which such benchmarks are interesting.
> >
> >
> > This fact from Andrei, that Cinebench contains long dependency chains,
> > explains very well why M1 is less good in it than in other benchmarks.
> >
> >
> > Among 2 CPUs having the same average performance, a benchmark that contains
> > long dependency chains will favor the one having a low IPC and a high clock
> > frequency, against the one having a high IPC and a low clock frequency.
> >
>
> How long is a "long dependency chain"?
>
> The role of the issue queue (what Intel calls "Scheduler Size") is buffer dependency chains
> and allow you to move execution beyond them, so this is not a trivial question.
> If the dependency chains are, say, 30 instructions in length then an
> issue queue of ~100 entries can run three such chains in parallel.
> If the dependency chains are, say, 100 instructions in length, and CPU A has an
> issue queue of ~100 while CPU B has an issue queue of ~300 then CPU B will be able
> to run instructions 3x as fast as CPU A all other things being considered.
>
> Even apart from this question (which is a question about the programs) there
> is a question on the CPU side of how good the issue scheduler(s) are. In particular
> are the schedulers somehow tracking and informed of criticality?
> Adding criticality to the scheduler is not easy (I don't think we have any confirmed cases in the literature)
> but if you do add that to your scheduler, you can be more certain of running those dependency chains
> as fast as theoretically possible, while delaying less important work till it can find free slots;
> whereas if you don't track criticality you may be wasting cycles by giving them to non-critical instructions
> that just happen to be older than the critical instructions in the chain.
>
You are right, but unfortunately we can only speculate about these, as we do not have enough quantitative information.