By: none (none.delete@this.none.com), January 11, 2021 1:26 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on January 9, 2021 8:00 am wrote:
[...]
> So Zen 3 is around 22% faster than Apple M1 at bignum arithmetic, due to the reasons mentioned in a previous
> post, i.e. that when an execution resource reaches 100% utilization, the IPC remains clamped at the same
> value for both M1 and Zen 3, and then the CPU with the higher clock frequency is advantaged.
According to this GMP table M1 has a better IPC than Zen3 for many base routines.
For instance addmul_1 is 1.5 cycle/limb on Zen 3 while it is 1.25 on M1. As far as M1 goes
this is slightly above the theoretical number of multiplications it can issue (that is
2 64x64b->64b (high or low part) integer muls per cycles, so one 64bx64b->128 per cycle).
mul_1 is at one cycle/limb (vs 1.5 for Zen 3) so that's at the max already, so it's quite
likely addmul_1 is already at the max due to the rest of computations.
My understanding is that Zen 2 can issue a single 64bx64b->128 per cycle too. So Zen 2 (and
I assume Zen 3) don't saturate their multipliers contrary to M1 on mul_1.
[...]
> So Zen 3 is around 22% faster than Apple M1 at bignum arithmetic, due to the reasons mentioned in a previous
> post, i.e. that when an execution resource reaches 100% utilization, the IPC remains clamped at the same
> value for both M1 and Zen 3, and then the CPU with the higher clock frequency is advantaged.
According to this GMP table M1 has a better IPC than Zen3 for many base routines.
For instance addmul_1 is 1.5 cycle/limb on Zen 3 while it is 1.25 on M1. As far as M1 goes
this is slightly above the theoretical number of multiplications it can issue (that is
2 64x64b->64b (high or low part) integer muls per cycles, so one 64bx64b->128 per cycle).
mul_1 is at one cycle/limb (vs 1.5 for Zen 3) so that's at the max already, so it's quite
likely addmul_1 is already at the max due to the rest of computations.
My understanding is that Zen 2 can issue a single 64bx64b->128 per cycle too. So Zen 2 (and
I assume Zen 3) don't saturate their multipliers contrary to M1 on mul_1.