By: noko (noko.delete@this.noko.com), November 4, 2022 9:49 pm
Room: Moderated Discussions
Jeffrey Bosboom (firstinitiallastname.delete@this.firstnamelastname.com) on November 4, 2022 6:18 pm wrote:
>
> I understand how cracking a 2n-bit instruction into two n-bit instructions and executing them sequentially
> saves area compared to a full 2n-bit-wide unit. But what is the difference between one 2n-bit unit and
I don't think cracking necessarily implies sequential execution; I think the cracking employed by Zen 1 meant both cracked µops can still execute simultaneously, but they don't *have* to. The cost there was that each cracked µop takes individual resources in the scheduler, ROB, etc. But in Zen 4 each AVX-512 instruction only takes one entry in the scheduler, and can *only* be issued simultaneously to two pipelines in the *same* cycle (akin to Skylake-client and Neoverse-V1)
From the Zen 4 presentation, I think it initially sounded something more like ARM's Multi-Cycle ops, which also take one entry in the scheduler etc, but with the critical difference that when issued they hog the specific pipeline they're dispatched to for two consecutive cycles.
>
> I understand how cracking a 2n-bit instruction into two n-bit instructions and executing them sequentially
> saves area compared to a full 2n-bit-wide unit. But what is the difference between one 2n-bit unit and
I don't think cracking necessarily implies sequential execution; I think the cracking employed by Zen 1 meant both cracked µops can still execute simultaneously, but they don't *have* to. The cost there was that each cracked µop takes individual resources in the scheduler, ROB, etc. But in Zen 4 each AVX-512 instruction only takes one entry in the scheduler, and can *only* be issued simultaneously to two pipelines in the *same* cycle (akin to Skylake-client and Neoverse-V1)
From the Zen 4 presentation, I think it initially sounded something more like ARM's Multi-Cycle ops, which also take one entry in the scheduler etc, but with the critical difference that when issued they hog the specific pipeline they're dispatched to for two consecutive cycles.