By: hobold (hobold.delete@this.vectorizer.org), January 8, 2021 12:44 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on January 2, 2021 2:45 am wrote:
[...]
> In computational benchmarks where the number and speed of the available execution resources matter
> most, unlike in GB5 or SPEC, where the higher *average* IPC of Apple shines, the advantage of Zen
> 3 over Apple M1 increases, being e.g. of over 14% @ 4.9 GHz for gmpbench (7337 vs. 6422).
That's an interesting contradiction. Apple M1 does have fewer and slower execution resources, but higher IPC.
Pure speculation: Apple M1 can sometimes (rarely) execute a pair of dependent, simple instructions within one single clock cycle. This would occasionally (rarely) gain one cycle on the other processors that require two subsequent cycles for the dependency.
Naturally this would be a very narrow, very targeted optimization. But there just might be pairs that are frequent enough to be worth it. Maybe something like shift (by a constant) and add. Generally, a chain of two-operand instructions, such that the whole chain does not read / write more registers than what can be done in a single cycle.
[...]
> In computational benchmarks where the number and speed of the available execution resources matter
> most, unlike in GB5 or SPEC, where the higher *average* IPC of Apple shines, the advantage of Zen
> 3 over Apple M1 increases, being e.g. of over 14% @ 4.9 GHz for gmpbench (7337 vs. 6422).
That's an interesting contradiction. Apple M1 does have fewer and slower execution resources, but higher IPC.
Pure speculation: Apple M1 can sometimes (rarely) execute a pair of dependent, simple instructions within one single clock cycle. This would occasionally (rarely) gain one cycle on the other processors that require two subsequent cycles for the dependency.
Naturally this would be a very narrow, very targeted optimization. But there just might be pairs that are frequent enough to be worth it. Maybe something like shift (by a constant) and add. Generally, a chain of two-operand instructions, such that the whole chain does not read / write more registers than what can be done in a single cycle.