By: Michael S (already5chosen.delete@this.yahoo.com), November 4, 2019 1:22 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on November 4, 2019 10:27 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on November 4, 2019 10:03 am wrote:
> >
> > Doubling? That's way too optimistic.
> > In most general case, one needs 5 RISC-V instruction to emulate add with carry.
>
> What? No. I'm not a fan of how RISC-V has all those subsets, but add-with-carry isn't all that complex.
>
> Afaik, a 128-bit add, which on x86 would be two instructions (add+adc),
> should only be four on RISC-V (add+sltu+add+add). No?
>
128-bit is the easiest case. The first addition produces carry, but never consumes carry. The second addition consumes carry, but never produces it.
Try 192-bit or more. That's where replacement for middle ADCs will start to take a lot of RISC-V instructions.
> Is that optimal? No. But the doubling doesn't sound too optimistic in at least the obvious cases.
>
> Would it have been more "interesting" to see architectures play with extra bits in the register
> instead (still honoring the 2R1W model), and maybe have a 66-bit register with C and O bits
> in the high bits? Yes it would have. But "interesting" tends to cause problems.
>
> Not having a flags register wasn't a huge deal on alpha or MIPS.
> They had ninety nine problems, but lack of carry ain't one.
>
> Linus
That's what I said in my original post. Annoying, but has surprisingly small effect on total performance of multi-precision arithmetic. Esp. on wide superscalar machines.
As an experiment. few years ago I coded a multiplication of extended-precision floating point numbers (128-bit significand + 32-bit exponent) in x64 asm with and without exploitation of add-with-carry.
The speed difference was pretty small. IIRC, order of 10-15% on IvyBridge.
I'd guess that for a fixed-point the impact would be bigger, but still not catastrophic, dispute quite catastrophically looking "arithmetic" part.
> Michael S (already5chosen.delete@this.yahoo.com) on November 4, 2019 10:03 am wrote:
> >
> > Doubling? That's way too optimistic.
> > In most general case, one needs 5 RISC-V instruction to emulate add with carry.
>
> What? No. I'm not a fan of how RISC-V has all those subsets, but add-with-carry isn't all that complex.
>
> Afaik, a 128-bit add, which on x86 would be two instructions (add+adc),
> should only be four on RISC-V (add+sltu+add+add). No?
>
128-bit is the easiest case. The first addition produces carry, but never consumes carry. The second addition consumes carry, but never produces it.
Try 192-bit or more. That's where replacement for middle ADCs will start to take a lot of RISC-V instructions.
> Is that optimal? No. But the doubling doesn't sound too optimistic in at least the obvious cases.
>
> Would it have been more "interesting" to see architectures play with extra bits in the register
> instead (still honoring the 2R1W model), and maybe have a 66-bit register with C and O bits
> in the high bits? Yes it would have. But "interesting" tends to cause problems.
>
> Not having a flags register wasn't a huge deal on alpha or MIPS.
> They had ninety nine problems, but lack of carry ain't one.
>
> Linus
That's what I said in my original post. Annoying, but has surprisingly small effect on total performance of multi-precision arithmetic. Esp. on wide superscalar machines.
As an experiment. few years ago I coded a multiplication of extended-precision floating point numbers (128-bit significand + 32-bit exponent) in x64 asm with and without exploitation of add-with-carry.
The speed difference was pretty small. IIRC, order of 10-15% on IvyBridge.
I'd guess that for a fixed-point the impact would be bigger, but still not catastrophic, dispute quite catastrophically looking "arithmetic" part.