By: Michael S (already5chosen.delete@this.yahoo.com), November 4, 2019 11:03 am
Room: Moderated Discussions
anon (spam.delete.delete@this.this.spam.com) on November 4, 2019 4:22 am wrote:
> none (none.delete@this.none.com) on November 4, 2019 2:59 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on November 4, 2019 1:45 am wrote:
> > [..]
> > > The only relatively significant field where absence of flags can cause problems is multiple-precision
> > > arithmetic. Not too significant for most of us. And it's
> > > not like absence of flags makes multiple-precision
> > > arithmetic many times slower at application level. Typically,
> > > the most cycles-consuming part of multiple-precision
> > > library is in house-keeping and other overheads, not in the arithmetic itself.
> > > That's even more true in cryptographic applications of multiple-precision.
> > > For something, like full verification
> > > of ECDSA signature, effects of presence of absence of arithmetic flags in the ISA are barely above noise
> > > floor, esp. if one uses the most popular not too optimized implementation, i.e OSSL.
> >
> > Intel added an extension, ADX, that changes flag behavior just to speed up multi precision
> > computation, in particular RSA. I guess they must have seen enough improvements to justify
> > it :-)
> >
> > This wolfSSL page says this:
> >
> >
> >
>
> This is a reply to both previous posts, I just don't want to split it.
>
> Which shows that there's no problem implementing the flags, that's working just fine with ADX.
> However neither is it a sign that this is faster with flags than without.
> It's just fundamentally impossible to do two ADC chains in parallel when they both want to write
> to the same flag. Obviously Intel doesn't mind partial flag register states one bit since this actually
> speeds things up. Maybe it's "somewhat complex", but then again the RISC-V people are arguing for
> fusion, dynamic hammock predication, and even more complex things to be implemented...
>
> If you were to write the carry bit into an explicitly named register it would actually be better
> since you could then run as many ADC chains in parallel as your registers allowed. But RISC-V can't
> do that either. Instead the carry has to be manually computed and added every time. I'm sure that
> they'll argue that this doubling of instructions will go away by just using fusion.
>
Doubling? That's way too optimistic.
In most general case, one needs 5 RISC-V instruction to emulate add with carry.
>
> And no, it's not in the noise floor once you look beyond the bare arithmetic.
> https://eprint.iacr.org/2019/794.pdf
> none (none.delete@this.none.com) on November 4, 2019 2:59 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on November 4, 2019 1:45 am wrote:
> > [..]
> > > The only relatively significant field where absence of flags can cause problems is multiple-precision
> > > arithmetic. Not too significant for most of us. And it's
> > > not like absence of flags makes multiple-precision
> > > arithmetic many times slower at application level. Typically,
> > > the most cycles-consuming part of multiple-precision
> > > library is in house-keeping and other overheads, not in the arithmetic itself.
> > > That's even more true in cryptographic applications of multiple-precision.
> > > For something, like full verification
> > > of ECDSA signature, effects of presence of absence of arithmetic flags in the ISA are barely above noise
> > > floor, esp. if one uses the most popular not too optimized implementation, i.e OSSL.
> >
> > Intel added an extension, ADX, that changes flag behavior just to speed up multi precision
> > computation, in particular RSA. I guess they must have seen enough improvements to justify
> > it :-)
> >
> > This wolfSSL page says this:
> >
The assembly code for x86_64 is better than the C code by between 23%
> > and 46% on x86_64 and 92% and 144% using BMI2 and ADX instructions.
> >
> >
>
> This is a reply to both previous posts, I just don't want to split it.
>
> Which shows that there's no problem implementing the flags, that's working just fine with ADX.
> However neither is it a sign that this is faster with flags than without.
> It's just fundamentally impossible to do two ADC chains in parallel when they both want to write
> to the same flag. Obviously Intel doesn't mind partial flag register states one bit since this actually
> speeds things up. Maybe it's "somewhat complex", but then again the RISC-V people are arguing for
> fusion, dynamic hammock predication, and even more complex things to be implemented...
>
> If you were to write the carry bit into an explicitly named register it would actually be better
> since you could then run as many ADC chains in parallel as your registers allowed. But RISC-V can't
> do that either. Instead the carry has to be manually computed and added every time. I'm sure that
> they'll argue that this doubling of instructions will go away by just using fusion.
>
Doubling? That's way too optimistic.
In most general case, one needs 5 RISC-V instruction to emulate add with carry.
>
> And no, it's not in the noise floor once you look beyond the bare arithmetic.
> https://eprint.iacr.org/2019/794.pdf