By: --- (---.delete@this.redheron.com), June 5, 2022 6:12 pm
Room: Moderated Discussions
⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on June 4, 2022 11:39 am wrote:
> Peter Lewis (peter.delete@this.notyahoo.com) on June 1, 2022 3:55 pm wrote:
> It is possible that in the near future it will become clear that CPU performance is determined
> by the number of [conditional] branches successfully predicted per cycle - and not by whether
> the CPU's compiler/user interface uses a fixed-length or a variable-length encoding.
This is a great point, a rare example of skating to where the puck will be rather than where it was thirty years ago.
As regards ISA, the obvious question, then, is what can ISA do to improve this situation?
- predicates/CMOV/CSEL. I have zero interest in relitigating this for the n'th time, I'll just state that if you want to avoid many *difficult to predict* branches, this is one technology to do so; and if one is upset about the cases where this is a bad technology choice because of lengthening the critical path, then the place to complain/fix it is in the compiler (*difficult to predict* branches!!!) not in saying that they should not be used.
- convert simple branches (eg compare followed by one or two instructions) to predicates. This is the micro-architecture solution if the compiler just cannot get its act together.
The IBM patent on this expires soon (or maybe already has, it's 2021 or 2022) and this tech coupled to a confidence predictor again solves some of the problem. This solution works easily for standard (4byte sized) RISC; for x86 branches I don't know, I would guess trickier but feasible.
- a dark horse is ARM's new (v8.8) Branch Consistent (BC) instruction. Does anyone know the backstory behind this (like is it from within ARM or an Apple suggestion or ???)
It's somewhat vaguely described (and much may be in the hands of the implementer) but my simple-minded reading of it is that it's a way to move the easy branches (as always, assuming the compiler can get its act together...) out of the fancy branch machinery to much simpler branch machinery, so that the few really hard branches can have much more tech thrown at them.
But the reliance on the compiler is the Achilles' heel; I suspect anyone thinking of actually going down this path might be better off just implementing an aggressive branch confidence tracker and using that to move branches between the easy machinery and the fancy machinery.
> Peter Lewis (peter.delete@this.notyahoo.com) on June 1, 2022 3:55 pm wrote:
> It is possible that in the near future it will become clear that CPU performance is determined
> by the number of [conditional] branches successfully predicted per cycle - and not by whether
> the CPU's compiler/user interface uses a fixed-length or a variable-length encoding.
This is a great point, a rare example of skating to where the puck will be rather than where it was thirty years ago.
As regards ISA, the obvious question, then, is what can ISA do to improve this situation?
- predicates/CMOV/CSEL. I have zero interest in relitigating this for the n'th time, I'll just state that if you want to avoid many *difficult to predict* branches, this is one technology to do so; and if one is upset about the cases where this is a bad technology choice because of lengthening the critical path, then the place to complain/fix it is in the compiler (*difficult to predict* branches!!!) not in saying that they should not be used.
- convert simple branches (eg compare followed by one or two instructions) to predicates. This is the micro-architecture solution if the compiler just cannot get its act together.
The IBM patent on this expires soon (or maybe already has, it's 2021 or 2022) and this tech coupled to a confidence predictor again solves some of the problem. This solution works easily for standard (4byte sized) RISC; for x86 branches I don't know, I would guess trickier but feasible.
- a dark horse is ARM's new (v8.8) Branch Consistent (BC) instruction. Does anyone know the backstory behind this (like is it from within ARM or an Apple suggestion or ???)
It's somewhat vaguely described (and much may be in the hands of the implementer) but my simple-minded reading of it is that it's a way to move the easy branches (as always, assuming the compiler can get its act together...) out of the fancy branch machinery to much simpler branch machinery, so that the few really hard branches can have much more tech thrown at them.
But the reliance on the compiler is the Achilles' heel; I suspect anyone thinking of actually going down this path might be better off just implementing an aggressive branch confidence tracker and using that to move branches between the easy machinery and the fancy machinery.