By: Michael S (already5chosen.delete@this.yahoo.com), November 4, 2019 2:45 am
Room: Moderated Discussions
j (invalid.delete@this.example.net) on November 3, 2019 11:30 pm wrote:
> anon (anon.delete@this.anon.com) on November 3, 2019 9:00 pm wrote:
> > j (invalid.delete@this.example.net) on November 3, 2019 2:29 pm wrote:
> > > Adrian (a.delete@this.acm.org) on November 2, 2019 9:33 am wrote:
> > > > I refuse to believe to believe that instruction-pair fusion is simpler or better than the
> > > > trivial enhancement of the instruction encoding to cover the more complex instructions that
> > > > are needed in almost all loops, e.g. either with indexed or auto-indexed addressing.
> > > >
> > > >
> > >
> > > OTOH risc-v has compare-and-branch instructions so it doesn't need to fuse compare+jump,
> > > nor does it have a flags register. At least the risc-v people are claiming that virtualizing
> > > the flags register is a somewhat complex task for superscalar/OoO implementations.
> >
> > Do you mean renaming the flags register? Virtualizing it would generally
> > be handled entirely by the hypervisor, no hardware required.
>
> Sorry, poor wording on my part. I didn't mean anything related to hypervisors, virtual
> machines or the like. What I meant was whatever microarchitectural tricks which are
> needed to avoid bottle-necking performance on writing/reading the flags register.
>
> AFAIK one common approach is to split up the flags register into several "virtual" registers (hence
> my use of the term "virtualizing") that can be handled separately. I don't know if they go all the way
> to having one such virtual register for each bit in the flags register, or if they are grouped.
>
> Another might be to duplicate the flags, so that each normal register would have an associated flags register
> containing the flags for the latest instruction that wrote to that register, and then instructions that
> depend on the flag register such as jumps would somehow need to pick up a dependency on the correct flags
> register. Or maybe this is pointless if you just rename the flags as any normal register?
>
On x86 you do various tricks (Intel and AMD does not do it identically) because there are instructions that affect few arithmetic flags, but leave the rest of them unaffected. The prime example is INC/DEC that preserve carry flag, but modify the rest. So, if one handles flags as one register, INC/DEC become Partial Register Store.
You don't need it on newer ISAs, like aarch64, because ISA designers were aware of the problem and defined instruction sett in the way that any instruction either update all arithmetic flags or neither of them.
Another problem of x86 flags handling is shift instruction by variable count, which in case of count=zero has to preserve the flags. There is no Partial Register Store in this case, but there is still a dependency on old values of the flags, which in majority of cases (i.e. when count is non-zero) happens to be a false dependency.
Newer x86 processors support (as part of BMI2 extension) shifts that don't affect flags at all, but old instructions are still in very wide use, so implementer have to make them fast.
> Anyway, however it's done, the claim by the RISC-V developers was that this is somewhat complex to do, and
> by not having a flags register RISC-V avoids that. Is it worth it? Don't know, maybe someone knows better?
>
The only relatively significant field where absence of flags can cause problems is multiple-precision arithmetic. Not too significant for most of us. And it's not like absence of flags makes multiple-precision arithmetic many times slower at application level. Typically, the most cycles-consuming part of multiple-precision library is in house-keeping and other overheads, not in the arithmetic itself.
That's even more true in cryptographic applications of multiple-precision. For something, like full verification of ECDSA signature, effects of presence of absence of arithmetic flags in the ISA are barely above noise floor, esp. if one uses the most popular not too optimized implementation, i.e OSSL.
So, in that regard, I tend to think that RISC-V people are not too wrong,
I am much more concerned about conditional moves.
> anon (anon.delete@this.anon.com) on November 3, 2019 9:00 pm wrote:
> > j (invalid.delete@this.example.net) on November 3, 2019 2:29 pm wrote:
> > > Adrian (a.delete@this.acm.org) on November 2, 2019 9:33 am wrote:
> > > > I refuse to believe to believe that instruction-pair fusion is simpler or better than the
> > > > trivial enhancement of the instruction encoding to cover the more complex instructions that
> > > > are needed in almost all loops, e.g. either with indexed or auto-indexed addressing.
> > > >
> > > >
> > >
> > > OTOH risc-v has compare-and-branch instructions so it doesn't need to fuse compare+jump,
> > > nor does it have a flags register. At least the risc-v people are claiming that virtualizing
> > > the flags register is a somewhat complex task for superscalar/OoO implementations.
> >
> > Do you mean renaming the flags register? Virtualizing it would generally
> > be handled entirely by the hypervisor, no hardware required.
>
> Sorry, poor wording on my part. I didn't mean anything related to hypervisors, virtual
> machines or the like. What I meant was whatever microarchitectural tricks which are
> needed to avoid bottle-necking performance on writing/reading the flags register.
>
> AFAIK one common approach is to split up the flags register into several "virtual" registers (hence
> my use of the term "virtualizing") that can be handled separately. I don't know if they go all the way
> to having one such virtual register for each bit in the flags register, or if they are grouped.
>
> Another might be to duplicate the flags, so that each normal register would have an associated flags register
> containing the flags for the latest instruction that wrote to that register, and then instructions that
> depend on the flag register such as jumps would somehow need to pick up a dependency on the correct flags
> register. Or maybe this is pointless if you just rename the flags as any normal register?
>
On x86 you do various tricks (Intel and AMD does not do it identically) because there are instructions that affect few arithmetic flags, but leave the rest of them unaffected. The prime example is INC/DEC that preserve carry flag, but modify the rest. So, if one handles flags as one register, INC/DEC become Partial Register Store.
You don't need it on newer ISAs, like aarch64, because ISA designers were aware of the problem and defined instruction sett in the way that any instruction either update all arithmetic flags or neither of them.
Another problem of x86 flags handling is shift instruction by variable count, which in case of count=zero has to preserve the flags. There is no Partial Register Store in this case, but there is still a dependency on old values of the flags, which in majority of cases (i.e. when count is non-zero) happens to be a false dependency.
Newer x86 processors support (as part of BMI2 extension) shifts that don't affect flags at all, but old instructions are still in very wide use, so implementer have to make them fast.
> Anyway, however it's done, the claim by the RISC-V developers was that this is somewhat complex to do, and
> by not having a flags register RISC-V avoids that. Is it worth it? Don't know, maybe someone knows better?
>
The only relatively significant field where absence of flags can cause problems is multiple-precision arithmetic. Not too significant for most of us. And it's not like absence of flags makes multiple-precision arithmetic many times slower at application level. Typically, the most cycles-consuming part of multiple-precision library is in house-keeping and other overheads, not in the arithmetic itself.
That's even more true in cryptographic applications of multiple-precision. For something, like full verification of ECDSA signature, effects of presence of absence of arithmetic flags in the ISA are barely above noise floor, esp. if one uses the most popular not too optimized implementation, i.e OSSL.
So, in that regard, I tend to think that RISC-V people are not too wrong,
I am much more concerned about conditional moves.