By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), February 15, 2013 5:46 pm
Room: Moderated Discussions
Max (max.delete@this.a.com) on February 15, 2013 1:24 pm wrote:
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on February 15, 2013 2:28 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on February 14, 2013 4:44 pm wrote:
> > > And putting the condition codes in GPRs (alpha) may look really
> > > clean and allows some nice combined conditionals,
> > > but the upside just isn't there. It doesn't really generate better code: you win some, you lose some.
> >
> > Due to the very compact instruction encoding SuperH processors
> > use a system which is probably the "cleanest"
> > I've seen: they have a single-bit flag register (T register) which is implicitly set by comparison
> > operations and read by conditional branch instructions.
>
> Motorola MCORE did that also, for the same reason (only 16-bit instructions, and with
> a single bit flag the branch displacement is enlarged by 3 bits, e.g. from 8 to 11).
That's a good tradeoff, but you need 3 extra bits on compares of course, potentially reducing the size of the immediate. Combining the compare and branch into a 32-bit instruction means you only need 1 opcode, some more spare bits and allowing for a better tradeoff.
My take on the cleanest&purest approach that has good performance as well as small codesize would be as follows:
* Combined compare&branch with 2 regs, 1 reg+imm, including test of any bit, carry, overflow. This is sufficient for the vast majority of branches, saves 1 instruction for every branch and still allows a large enough offset. If necessary a branch of zero/nonzero of a single register (or small subset of registers) could be added to support a far larger offset.
* A generic compare and select operation. Limiting to 2 GPR inputs reduces its usefulness somewhat, but it still allows an arithmetic operation as well as an immediate (either on the comparison or as part of the arithmetic operation). This enables compares to GPR register with a boolean result, conditional moves but also common cases which otherwise require conditional execution:
r0 = r1 > 0 ? r2 : r2 - 1
r0 = carry(r1+r2) ? 0 : 255
r0 = signed_overflow(r1+255) ? r2 : 0
r0 = r1 >= r2 ? 1 : 0
r0 = bit21(r1) ? r2 : 255 - r2
r0 = r1 > 255 ? r2 : 0
r0 = r1 != 0 ? r1 : r2
(yes it would be an interesting challenge to design the encoding and likely implementation too, but I believe it is worth having powerful instructions!)
* Overflow check on add/sub can be dealt with a normal compare+branch (do compare+branch if overflow, then add/sub). Add/sub with trap on overflow seems too complex, even if implemented as a branch to a specified address (rather than trap into OS).
* Add/sub with carry. The need for this is rare in high-level code and reduced by wide SIMD extensions and multiplies that support multi-precision arithmetic. So on the integer side I'm inclined to go for the purist approach and claim that the above compare+branch and select instructions are more than good enough.
Wilco
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on February 15, 2013 2:28 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on February 14, 2013 4:44 pm wrote:
> > > And putting the condition codes in GPRs (alpha) may look really
> > > clean and allows some nice combined conditionals,
> > > but the upside just isn't there. It doesn't really generate better code: you win some, you lose some.
> >
> > Due to the very compact instruction encoding SuperH processors
> > use a system which is probably the "cleanest"
> > I've seen: they have a single-bit flag register (T register) which is implicitly set by comparison
> > operations and read by conditional branch instructions.
>
> Motorola MCORE did that also, for the same reason (only 16-bit instructions, and with
> a single bit flag the branch displacement is enlarged by 3 bits, e.g. from 8 to 11).
That's a good tradeoff, but you need 3 extra bits on compares of course, potentially reducing the size of the immediate. Combining the compare and branch into a 32-bit instruction means you only need 1 opcode, some more spare bits and allowing for a better tradeoff.
My take on the cleanest&purest approach that has good performance as well as small codesize would be as follows:
* Combined compare&branch with 2 regs, 1 reg+imm, including test of any bit, carry, overflow. This is sufficient for the vast majority of branches, saves 1 instruction for every branch and still allows a large enough offset. If necessary a branch of zero/nonzero of a single register (or small subset of registers) could be added to support a far larger offset.
* A generic compare and select operation. Limiting to 2 GPR inputs reduces its usefulness somewhat, but it still allows an arithmetic operation as well as an immediate (either on the comparison or as part of the arithmetic operation). This enables compares to GPR register with a boolean result, conditional moves but also common cases which otherwise require conditional execution:
r0 = r1 > 0 ? r2 : r2 - 1
r0 = carry(r1+r2) ? 0 : 255
r0 = signed_overflow(r1+255) ? r2 : 0
r0 = r1 >= r2 ? 1 : 0
r0 = bit21(r1) ? r2 : 255 - r2
r0 = r1 > 255 ? r2 : 0
r0 = r1 != 0 ? r1 : r2
(yes it would be an interesting challenge to design the encoding and likely implementation too, but I believe it is worth having powerful instructions!)
* Overflow check on add/sub can be dealt with a normal compare+branch (do compare+branch if overflow, then add/sub). Add/sub with trap on overflow seems too complex, even if implemented as a branch to a specified address (rather than trap into OS).
* Add/sub with carry. The need for this is rare in high-level code and reduced by wide SIMD extensions and multiplies that support multi-precision arithmetic. So on the integer side I'm inclined to go for the purist approach and claim that the above compare+branch and select instructions are more than good enough.
Wilco