By: G. Boniface (boniface.delete@this.example.edu), November 7, 2019 6:59 am
Room: Moderated Discussions
anon.1 (abc.delete@this.def.com) on November 6, 2019 11:19 am wrote:
> Seems like the RISC-V
> folks recommend op-fusion, which is another thing I find ridiculous. The whole point of RISC was to make
> decode simple. Now they want to add complexity in decode because, well, the ISA is oversimplified. Take
> that idea further and a uopcache is the next logical step because you can't sustain dispatch bandwidth or
> add extra pipe stages for fusion (it's not magic pixie dust, transistor gates have to be spent). Madness.
Actually, it's not madness. RISC-V is very explicitly architected to allow a wide range of implementations, from "classic RISC" single-issue in-order pipelines to aggressive OoO.
The claim made (which I cannot verify first-hand, but seems plausible) is that once you've spent the transistors on the complex dependency tracking required to implement superscalar execution, then op-fusion comes at minimal incremental cost.
The apparently contradictory claims can both be true because they apply to different points on the cost-performance spectrum. If you want simple decoding and a tight transistor budget, RISC-V (without op fusion) has it. If you want high performance and have a correspondingly lavish transistor budget, RISC-V (with op fusion) has it.
While op fusion requires additional pipeline stages, the claim is that they are the same stages as are required for superscalar execution anyway, so the incremental cost is low.
(On a tangent, I'm reminded of Linus's criticism of weak memory models on the grounds that once you've paid the cost for a high-performance memory subsystem, then stronger memory models come at minimal additional cost. So weaker models burden the programmer with no genuine compensating performance benefit because they only help performance on inherently low-performance implementations.)
> Seems like the RISC-V
> folks recommend op-fusion, which is another thing I find ridiculous. The whole point of RISC was to make
> decode simple. Now they want to add complexity in decode because, well, the ISA is oversimplified. Take
> that idea further and a uopcache is the next logical step because you can't sustain dispatch bandwidth or
> add extra pipe stages for fusion (it's not magic pixie dust, transistor gates have to be spent). Madness.
Actually, it's not madness. RISC-V is very explicitly architected to allow a wide range of implementations, from "classic RISC" single-issue in-order pipelines to aggressive OoO.
The claim made (which I cannot verify first-hand, but seems plausible) is that once you've spent the transistors on the complex dependency tracking required to implement superscalar execution, then op-fusion comes at minimal incremental cost.
The apparently contradictory claims can both be true because they apply to different points on the cost-performance spectrum. If you want simple decoding and a tight transistor budget, RISC-V (without op fusion) has it. If you want high performance and have a correspondingly lavish transistor budget, RISC-V (with op fusion) has it.
While op fusion requires additional pipeline stages, the claim is that they are the same stages as are required for superscalar execution anyway, so the incremental cost is low.
(On a tangent, I'm reminded of Linus's criticism of weak memory models on the grounds that once you've paid the cost for a high-performance memory subsystem, then stronger memory models come at minimal additional cost. So weaker models burden the programmer with no genuine compensating performance benefit because they only help performance on inherently low-performance implementations.)