By: Michael S (already5chosen.delete@this.yahoo.com), July 7, 2015 1:19 am
Room: Moderated Discussions
someotherdude (someotherdude.delete@this.none.none.none.none) on July 6, 2015 5:12 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 6, 2015 3:02 pm wrote:
> > SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > > Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 10:25 am wrote:
> > >
> > > >
> > > > The IBM branch over one instruction is neat, but, like you said
> > > > for forming immediates, it reflects a hole in the iSA.
> > >
> > > Power has a conditional move (isel) since v2.06 so a programmer/compiler
> > > should choose it over branches if it's just to skip 1-2 instructions.
> > >
> > > Maybe that kind of hardware optimization is there for old non-recompiled code?
> >
> > The compiler should chose a select instruction if the branch is unpredictable. If the branch
> > is predictable (or at least if rarely taken), then a branch instruction is more appropriate. If
> > the compiler does not know (or the predictability varies dynamically), then having the hardware
> > dynamically predicate makes some sense. (Called to supper, so will have to come back tomorrow.)
>
> Without profiling, a compiler can not easily determine if a branch is predictable or not.
>
> The purpose of cmov/isel is to remove branches so that the branch predictor does not need
> to predict them. It saves entries in your BP for predicting other branches, e.g. around
> code that would have side effects if executed. If you have a wide enough machine or small
> enough conditionally executed body, the overhead of computing both sides of the cmov/isel
> is small, so the compiler can probably emit aggressively for the trivial cases.
As pointed above by EduardoS, the main (performance) cost of branch avoidance on wide machine is not unnecessary instructions, you execute, but unnecessary stalls you suffer due to false data dependencies.
>
> The other day I read something about unrolling a loop containing an unpredictable branch. I remember
> studying why it worked many years ago, but I've forgotten the reason why it worked. I don't remember
> if it just made it so that the BTB was better utilized (skewing the branches across multiple entries
> vs. just one) or if it was due to being able to fetch more instructions of the loop body before the
> (easily predictable) loop termination branch. I'm guessing it's the latter now.
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 6, 2015 3:02 pm wrote:
> > SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > > Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 10:25 am wrote:
> > >
> > > >
> > > > The IBM branch over one instruction is neat, but, like you said
> > > > for forming immediates, it reflects a hole in the iSA.
> > >
> > > Power has a conditional move (isel) since v2.06 so a programmer/compiler
> > > should choose it over branches if it's just to skip 1-2 instructions.
> > >
> > > Maybe that kind of hardware optimization is there for old non-recompiled code?
> >
> > The compiler should chose a select instruction if the branch is unpredictable. If the branch
> > is predictable (or at least if rarely taken), then a branch instruction is more appropriate. If
> > the compiler does not know (or the predictability varies dynamically), then having the hardware
> > dynamically predicate makes some sense. (Called to supper, so will have to come back tomorrow.)
>
> Without profiling, a compiler can not easily determine if a branch is predictable or not.
>
> The purpose of cmov/isel is to remove branches so that the branch predictor does not need
> to predict them. It saves entries in your BP for predicting other branches, e.g. around
> code that would have side effects if executed. If you have a wide enough machine or small
> enough conditionally executed body, the overhead of computing both sides of the cmov/isel
> is small, so the compiler can probably emit aggressively for the trivial cases.
As pointed above by EduardoS, the main (performance) cost of branch avoidance on wide machine is not unnecessary instructions, you execute, but unnecessary stalls you suffer due to false data dependencies.
>
> The other day I read something about unrolling a loop containing an unpredictable branch. I remember
> studying why it worked many years ago, but I've forgotten the reason why it worked. I don't remember
> if it just made it so that the BTB was better utilized (skewing the branches across multiple entries
> vs. just one) or if it was due to being able to fetch more instructions of the loop body before the
> (easily predictable) loop termination branch. I'm guessing it's the latter now.