By: Adrian (a.delete@this.acm.org), January 9, 2021 8:00 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on January 9, 2021 12:48 am wrote:
> Maynard Handley (name99.delete@this.name99.org) on January 8, 2021 3:59 pm wrote:
>
>
> > Do we have any sort of sense as to how much GMP has had care and optimization put into it for ARMv8/NEON
> > vs x86? It is still at the basic "get the damn thing working"
> > level, or has the sort of obsessive micro-optimization
> > one (eventually) expects in these sorts of libraries already been applied?
> > You can see some of the flux that's still happening in this space in the above
> > article, even with respect to a somewhat more mainstream library like BLAS.
> >
> >
>
> It seems that recent versions already have good ARM optimizations, but they are probably still improving.
>
> Even for AMD Zen 3 the optimizations keep improving, because I have tested my CPU with an older libgmp
> version giving a result of 7337, but a newer non-released yet libgmp version has reached 7816.
>
>
The gmpbench result of 6422 for Apple M1 was published by the libgmp team after adding to libgmp tuning specific for Apple M1.
The libgmp team has also added recently to libgmp tuning specific to AMD Zen 3 and they have published a score of 7816 for a Ryzen 7 5800X.
I had previously tested my own CPU with an older version of libgmp, without tuning for Zen 3, and the score was 7337, better than of any older CPU, but still not showing the full capabilities of Zen 3.
I have cloned now the libgmp repository, then I have built libgmp and I have run again the benchmark. The result was about the same as the one published by the libgmp maintainers.
Because the 6422 & 7816 scores were obtained in the same conditions, i.e. after adding the processor-specific tuning to libgmp, this should be a comparison of apples with apples.
So Zen 3 is around 22% faster than Apple M1 at bignum arithmetic, due to the reasons mentioned in a previous post, i.e. that when an execution resource reaches 100% utilization, the IPC remains clamped at the same value for both M1 and Zen 3, and then the CPU with the higher clock frequency is advantaged.
> Maynard Handley (name99.delete@this.name99.org) on January 8, 2021 3:59 pm wrote:
>
>
> > Do we have any sort of sense as to how much GMP has had care and optimization put into it for ARMv8/NEON
> > vs x86? It is still at the basic "get the damn thing working"
> > level, or has the sort of obsessive micro-optimization
> > one (eventually) expects in these sorts of libraries already been applied?
> > You can see some of the flux that's still happening in this space in the above
> > article, even with respect to a somewhat more mainstream library like BLAS.
> >
> >
>
> It seems that recent versions already have good ARM optimizations, but they are probably still improving.
>
> Even for AMD Zen 3 the optimizations keep improving, because I have tested my CPU with an older libgmp
> version giving a result of 7337, but a newer non-released yet libgmp version has reached 7816.
>
>
The gmpbench result of 6422 for Apple M1 was published by the libgmp team after adding to libgmp tuning specific for Apple M1.
The libgmp team has also added recently to libgmp tuning specific to AMD Zen 3 and they have published a score of 7816 for a Ryzen 7 5800X.
I had previously tested my own CPU with an older version of libgmp, without tuning for Zen 3, and the score was 7337, better than of any older CPU, but still not showing the full capabilities of Zen 3.
I have cloned now the libgmp repository, then I have built libgmp and I have run again the benchmark. The result was about the same as the one published by the libgmp maintainers.
Because the 6422 & 7816 scores were obtained in the same conditions, i.e. after adding the processor-specific tuning to libgmp, this should be a comparison of apples with apples.
So Zen 3 is around 22% faster than Apple M1 at bignum arithmetic, due to the reasons mentioned in a previous post, i.e. that when an execution resource reaches 100% utilization, the IPC remains clamped at the same value for both M1 and Zen 3, and then the CPU with the higher clock frequency is advantaged.