By: anon (anon.delete@this.b.c), November 8, 2022 1:58 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on November 8, 2022 12:29 pm wrote:
> Adrian (a.delete@this.acm.org) on November 8, 2022 11:45 am wrote:
> > cheaper to implement simultaneous execution instead of serial execution and that the simultaneous
> > execution might improve the performance of a small number of instructions.
>
> I do agree that that is the "obvious" implementation, and the one that
> afaik matches what they've done before when doing 128->256 widening.
That's not what they did before. Back then, they split into two separate ops early, and the OoO engine tracked two separate ops through the entire pipeline. This is not what is happening on Zen4, it's fairly clear that a 512-bit op is tracked as a single instruction right up to the execution units.
> Adrian (a.delete@this.acm.org) on November 8, 2022 11:45 am wrote:
> > cheaper to implement simultaneous execution instead of serial execution and that the simultaneous
> > execution might improve the performance of a small number of instructions.
>
> I do agree that that is the "obvious" implementation, and the one that
> afaik matches what they've done before when doing 128->256 widening.
That's not what they did before. Back then, they split into two separate ops early, and the OoO engine tracked two separate ops through the entire pipeline. This is not what is happening on Zen4, it's fairly clear that a 512-bit op is tracked as a single instruction right up to the execution units.