By: Freddie (freddie.delete@this.witherden.org), August 29, 2022 5:32 pm
Room: Moderated Discussions
anonymous2 (anonymous2.delete@this.example.com) on August 29, 2022 5:08 pm wrote:
> AVX-512 (ISA details murky) on Zen 4 but 2 cycles vs 1 on Intel so only 256b internally.
>
> Small win for those who want the ISA, but from a performance perspective limited value?
>
Execution time is not particularly important for SIMD instructions, and it is almost never one cycle for floating point anyway. What matters throughput and here Zen 4 is likely to be half rate on a per-cycle basis compared to high end Intel cores.
Saying that a lot of the value from AVX-512 comes from the ISA. Extra registers, embedded broadcasts in FMAs, and predication, to name a few. These are all useful. Moreover, most compilers will only emit 256-bit AVX-512 code by default unless explicitly told otherwise with -mprefer-vector-width=512 due to historical down-clocking issues on Intel CPUs. Thus, assuming AMD have not messed up the implementation it is likely to have some utility even for non-ML code.
Regards, Freddie.
> AVX-512 (ISA details murky) on Zen 4 but 2 cycles vs 1 on Intel so only 256b internally.
>
> Small win for those who want the ISA, but from a performance perspective limited value?
>
Execution time is not particularly important for SIMD instructions, and it is almost never one cycle for floating point anyway. What matters throughput and here Zen 4 is likely to be half rate on a per-cycle basis compared to high end Intel cores.
Saying that a lot of the value from AVX-512 comes from the ISA. Extra registers, embedded broadcasts in FMAs, and predication, to name a few. These are all useful. Moreover, most compilers will only emit 256-bit AVX-512 code by default unless explicitly told otherwise with -mprefer-vector-width=512 due to historical down-clocking issues on Intel CPUs. Thus, assuming AMD have not messed up the implementation it is likely to have some utility even for non-ML code.
Regards, Freddie.