By: Anon (no.delete@this.spam.com), November 4, 2022 11:30 pm
Room: Moderated Discussions
Jeffrey Bosboom (firstinitiallastname.delete@this.firstnamelastname.com) on November 4, 2022 11:05 pm wrote:
> From a Chips and Cheese article on Zen 4:
>
>
> below a table showing 0.94 IPC for Tiger Lake on "2:1 Mixed 256-bit and 512-bit FMA".
Intel case is different because they have this "mode" where the unified scheduller either see two 256 bits units or 1 512 bits unit, but the infrastructure beyond the port 0 and 1 is all 512 bits.
For Zen 4 everything seems to be 256 bits, what I want to know is how the testers concluded the same scheduller entry is feeding two units in what seems to be a different schedullers.
> From a Chips and Cheese article on Zen 4:
>
>
While Intel’s client architectures have comparable vector throughput to Zen 4, 512-bit operations through
> 256-bit pipes are handled differently. Intel fuses two 256-bit units across ports 0 and 1 to handle a 512-bit
> operation. This leads to some interesting characteristics when mixing 256-bit FMA instructions with 512-bit
> ones. Intel is stuck at one vector operation per cycle, likely because 256-bit FMA units on ports 0 and
> 1 have to be set to 1×512-bit or 2×256-bit mode, but cannot be in both modes at once.
> below a table showing 0.94 IPC for Tiger Lake on "2:1 Mixed 256-bit and 512-bit FMA".
Intel case is different because they have this "mode" where the unified scheduller either see two 256 bits units or 1 512 bits unit, but the infrastructure beyond the port 0 and 1 is all 512 bits.
For Zen 4 everything seems to be 256 bits, what I want to know is how the testers concluded the same scheduller entry is feeding two units in what seems to be a different schedullers.