By: anon.1 (abc.delete@this.def.com), November 7, 2019 11:00 pm
Room: Moderated Discussions
Foo_ (foo.delete@this.nomail.com) on November 7, 2019 11:42 am wrote:
> Ronald Maas (ronaldjmaas.delete@this.gmail.com) on November 7, 2019 8:29 am wrote:
> > Heikki Kultala (heikki.kultala.delete@this.tuni.fi) on November 7, 2019 7:39 am wrote:
> > > Totally bogus claim. The op fusion needs an additional stage in the FRONTEND
> > > while superscalar execution adds a stage(s) in the backend.
> > >
> >
> > On x86 fusion is implemented without requiring any additional pipeline stages.
> > Why are extra pipeline stages suddenly required for fusion on RISC-V?
>
> Isn't x86 op fusion quite simplistic? Unlike RISC-V, x86 doesn't lack many complex instructions ;-)
>
>
Right. The only fusion I am aware of in x86is cmp+jmp and alu+jmp. The other categories of fusion are mainly keeping complex ops fused. (I get the terminology mixed up but there’s micro op and macro op fusion). And with CMP+jmp, I interpreted Intel's description to imply that they are able to decode CMP followed by jmp as a single op (vs decode and then fuse). I'm guessing this based on the fact that their decode bandwidth increases in the presence of CMP+jmp. I could be wrong though.
> Ronald Maas (ronaldjmaas.delete@this.gmail.com) on November 7, 2019 8:29 am wrote:
> > Heikki Kultala (heikki.kultala.delete@this.tuni.fi) on November 7, 2019 7:39 am wrote:
> > > Totally bogus claim. The op fusion needs an additional stage in the FRONTEND
> > > while superscalar execution adds a stage(s) in the backend.
> > >
> >
> > On x86 fusion is implemented without requiring any additional pipeline stages.
> > Why are extra pipeline stages suddenly required for fusion on RISC-V?
>
> Isn't x86 op fusion quite simplistic? Unlike RISC-V, x86 doesn't lack many complex instructions ;-)
>
>
Right. The only fusion I am aware of in x86is cmp+jmp and alu+jmp. The other categories of fusion are mainly keeping complex ops fused. (I get the terminology mixed up but there’s micro op and macro op fusion). And with CMP+jmp, I interpreted Intel's description to imply that they are able to decode CMP followed by jmp as a single op (vs decode and then fuse). I'm guessing this based on the fact that their decode bandwidth increases in the presence of CMP+jmp. I could be wrong though.