By: Maynard Handley (name99.delete@this.name99.org), July 6, 2015 3:34 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 6, 2015 3:02 pm wrote:
> SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 10:25 am wrote:
> >
> > >
> > > The IBM branch over one instruction is neat, but, like you said
> > > for forming immediates, it reflects a hole in the iSA.
> >
> > Power has a conditional move (isel) since v2.06 so a programmer/compiler
> > should choose it over branches if it's just to skip 1-2 instructions.
> >
> > Maybe that kind of hardware optimization is there for old non-recompiled code?
>
> The compiler should chose a select instruction if the branch is unpredictable. If the branch
> is predictable (or at least if rarely taken), then a branch instruction is more appropriate. If
> the compiler does not know (or the predictability varies dynamically), then having the hardware
> dynamically predicate makes some sense. (Called to supper, so will have to come back tomorrow.)
This claim is frequently made, and I must admit I don't understand why.
Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20 cycles, free correctly predicted branches, and a single cycle select. Then you only have to mispredict >5% of the time for the sel to be a win. In superscalar CPUs the numbers get worse, especially since most of the time now the sel is likely going to be as "free" as a branch is.
The sel is going to have constant small background cost in terms of tying up one more register for each sel sitting in the ROB, but that seems unlikely to be a big cost.
As far as I can tell the conditional move hate started around the time of itanium, with some people having the ridiculous idea that you'd convert all branches, even those over tens of lines of code, into predicated operations, and the insanity spread from there.
I've tried to find papers comparing the relative costs of cmov vs branch, and providing cross-over point metrics, but have never had any success.
> SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 10:25 am wrote:
> >
> > >
> > > The IBM branch over one instruction is neat, but, like you said
> > > for forming immediates, it reflects a hole in the iSA.
> >
> > Power has a conditional move (isel) since v2.06 so a programmer/compiler
> > should choose it over branches if it's just to skip 1-2 instructions.
> >
> > Maybe that kind of hardware optimization is there for old non-recompiled code?
>
> The compiler should chose a select instruction if the branch is unpredictable. If the branch
> is predictable (or at least if rarely taken), then a branch instruction is more appropriate. If
> the compiler does not know (or the predictability varies dynamically), then having the hardware
> dynamically predicate makes some sense. (Called to supper, so will have to come back tomorrow.)
This claim is frequently made, and I must admit I don't understand why.
Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20 cycles, free correctly predicted branches, and a single cycle select. Then you only have to mispredict >5% of the time for the sel to be a win. In superscalar CPUs the numbers get worse, especially since most of the time now the sel is likely going to be as "free" as a branch is.
The sel is going to have constant small background cost in terms of tying up one more register for each sel sitting in the ROB, but that seems unlikely to be a big cost.
As far as I can tell the conditional move hate started around the time of itanium, with some people having the ridiculous idea that you'd convert all branches, even those over tens of lines of code, into predicated operations, and the insanity spread from there.
I've tried to find papers comparing the relative costs of cmov vs branch, and providing cross-over point metrics, but have never had any success.