By: Maynard Handley (name99.delete@this.name99.org), July 6, 2015 10:25 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on July 6, 2015 2:11 am wrote:
> Maynard Handley (name99.delete@this.name99.org) on July 5, 2015 6:01 pm wrote:
> > Interestingly I did not see any OTHER fusion possibilities in the code. In particular the possibility
> > IBM selected (fusing instructions to create a large immediate) is not utilized, which might, of course
> > reflect not enough time to add this to the design; but maybe also reflects something about ARM's constant
> > generation and so a less frequent need for generating immediates through successive instructions.
>
> In POWER8's case instruction fusion applies to add immediate / add immediate
> shifted + loads which covers a weakness in the ISA which might not apply to
> ARM (lack of base + immediate addressing for certain types of load).
>
> IBM has also been doing more aggressive fusion such as turning conditional-jump-over-a-single-instruction
> sequences into a single predicated µop (the instruction can be an add, and, or, xor plus some
> immediate forms as well as a store). See the POWER8 user manual section 10.1.4.7.
>
> Beside being different ISAs, IBM doesn't have many constraints as far as power
> goes so the tradeoffs they're making might not apply to Apple's design.
The IBM branch over one instruction is neat, but, like you said for forming immediates, it reflects a hole in the iSA.
I am guessing that Apple (and the whole ARM camp)'s answer to that is to use a conditional select. And of course Apple, in particular, have very little of a legacy code problem...
That does raise the issue that perhaps the next lowest lying fruit for ARM op-fusion might be op+predicated move as a single unit? That and three input add seem like the most common patterns left after compare and branch have been handled.
> Maynard Handley (name99.delete@this.name99.org) on July 5, 2015 6:01 pm wrote:
> > Interestingly I did not see any OTHER fusion possibilities in the code. In particular the possibility
> > IBM selected (fusing instructions to create a large immediate) is not utilized, which might, of course
> > reflect not enough time to add this to the design; but maybe also reflects something about ARM's constant
> > generation and so a less frequent need for generating immediates through successive instructions.
>
> In POWER8's case instruction fusion applies to add immediate / add immediate
> shifted + loads which covers a weakness in the ISA which might not apply to
> ARM (lack of base + immediate addressing for certain types of load).
>
> IBM has also been doing more aggressive fusion such as turning conditional-jump-over-a-single-instruction
> sequences into a single predicated µop (the instruction can be an add, and, or, xor plus some
> immediate forms as well as a store). See the POWER8 user manual section 10.1.4.7.
>
> Beside being different ISAs, IBM doesn't have many constraints as far as power
> goes so the tradeoffs they're making might not apply to Apple's design.
The IBM branch over one instruction is neat, but, like you said for forming immediates, it reflects a hole in the iSA.
I am guessing that Apple (and the whole ARM camp)'s answer to that is to use a conditional select. And of course Apple, in particular, have very little of a legacy code problem...
That does raise the issue that perhaps the next lowest lying fruit for ARM op-fusion might be op+predicated move as a single unit? That and three input add seem like the most common patterns left after compare and branch have been handled.