By: Travis Downs (travis.downs.delete@this.gmail.com), March 7, 2021 10:53 pm
Room: Moderated Discussions
Dougall Johnson has measured and reverse-engineered timing details both the big and little Apple M1 cores.
Interesting observations include that mov immediate (including negated movn) can be eliminated prior to execution (how?) and that the 3c load latency is only for "loads feed load address" scenario, otherwise the latency is 4c. This is similar to the Intel behavior until Ice Lake where the 4c latency was only for load-feeds-load (plus additional restrictons on the addressing expression), otherwise it was 5c and indicates a "load result to AGU" fast path.
Interesting observations include that mov immediate (including negated movn) can be eliminated prior to execution (how?) and that the 3c load latency is only for "loads feed load address" scenario, otherwise the latency is 4c. This is similar to the Intel behavior until Ice Lake where the 4c latency was only for load-feeds-load (plus additional restrictons on the addressing expression), otherwise it was 5c and indicates a "load result to AGU" fast path.