By: Adrian (a.delete@this.acm.org), November 4, 2019 6:10 am
Room: Moderated Discussions
anon (spam.delete.delete@this.this.spam.com) on November 4, 2019 4:22 am wrote:
>
> Which shows that there's no problem implementing the flags, that's working just fine with ADX.
> However neither is it a sign that this is faster with flags than without.
> It's just fundamentally impossible to do two ADC chains in parallel when they both want to write
> to the same flag. Obviously Intel doesn't mind partial flag register states one bit since this actually
> speeds things up. Maybe it's "somewhat complex", but then again the RISC-V people are arguing for
> fusion, dynamic hammock predication, and even more complex things to be implemented...
>
> If you were to write the carry bit into an explicitly named register it would actually be better
> since you could then run as many ADC chains in parallel as your registers allowed. But RISC-V can't
> do that either. Instead the carry has to be manually computed and added every time. I'm sure that
> they'll argue that this doubling of instructions will go away by just using fusion.
>
Yes, ADX is a workaround for the fact that x86 has only one flag register.
All the arithmetic operations for multiple precision have 3 inputs and 2 outputs, so for the best performance an instruction format with 5 register addresses would be needed.
There are several alternatives for reducing the number of addresses, the most common being to eliminate one input address and one output address by using implicitly a special register, i.e. a flag register for ADD/SUB/single shift or an extension register for MUL/DIV/multiple shift.
Another popular alternative that can be used without providing any special register is to split the multiple-precision operations in 2 instructions, such that each computes one of the 2 outputs. In a high-performance implementation the instruction pair can be fused.
However I find it slightly annoying that there are many ISA's where a mixed approach is used, e.g. a special flag register is used for compare/add/subtract/shift, but split instructions are used for mul/div. I would prefer a more uniform approach and no flags register. Especially for compare it is more convenient when the output can be in any general purpose register, or at least if there are multiple output registers as in POWER.
>
> Which shows that there's no problem implementing the flags, that's working just fine with ADX.
> However neither is it a sign that this is faster with flags than without.
> It's just fundamentally impossible to do two ADC chains in parallel when they both want to write
> to the same flag. Obviously Intel doesn't mind partial flag register states one bit since this actually
> speeds things up. Maybe it's "somewhat complex", but then again the RISC-V people are arguing for
> fusion, dynamic hammock predication, and even more complex things to be implemented...
>
> If you were to write the carry bit into an explicitly named register it would actually be better
> since you could then run as many ADC chains in parallel as your registers allowed. But RISC-V can't
> do that either. Instead the carry has to be manually computed and added every time. I'm sure that
> they'll argue that this doubling of instructions will go away by just using fusion.
>
Yes, ADX is a workaround for the fact that x86 has only one flag register.
All the arithmetic operations for multiple precision have 3 inputs and 2 outputs, so for the best performance an instruction format with 5 register addresses would be needed.
There are several alternatives for reducing the number of addresses, the most common being to eliminate one input address and one output address by using implicitly a special register, i.e. a flag register for ADD/SUB/single shift or an extension register for MUL/DIV/multiple shift.
Another popular alternative that can be used without providing any special register is to split the multiple-precision operations in 2 instructions, such that each computes one of the 2 outputs. In a high-performance implementation the instruction pair can be fused.
However I find it slightly annoying that there are many ISA's where a mixed approach is used, e.g. a special flag register is used for compare/add/subtract/shift, but split instructions are used for mul/div. I would prefer a more uniform approach and no flags register. Especially for compare it is more convenient when the output can be in any general purpose register, or at least if there are multiple output registers as in POWER.