ICL memory renaming?

By: Travis Downs (travis.downs.delete@this.gmail.com), August 4, 2019 3:00 pm
Room: Moderated Discussions
ll (ll.delete.delete@this.this.gmail.com) on August 3, 2019 8:37 pm wrote:

> the latency for skylake is strange
> for mov r64,[m64] + mov [m64], r64, the uop should be this:
> load r64, [m64]
> sta
> std
> for sta and std could isuss in same cycle, as load has 4 cycle latency, and also assume sta and
> std is four cycle, after sta and std, load of the next loop could get data through store to load
> forward, then for one loop, 8 cycle would be max cycle it take, but the result is 19 cycle.

Yes, I don't know where the 19 cycles comes from, or even the 10-11 cycles seem in other dumps. Store forwarding latency is generally between 3-6 cycles on modern Intel and the straightforward loop should achieve that. Maybe the loaded value is somehow used in the store addressing calculation which slows things down. That is, a loop like:

mov eax, [rdi + rbx]
mov [rdi + rbx], eax

is quite different than:

mov eax, [rdi + rbx]
mov [rdi + rax], eax

because the second case the store address can't be calculated until eax is available, which itself comes from the store-load chain and it slows things down a lot.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ICL memory renaming?Travis Downs2019/08/02 06:43 PM
  ICL memory renaming?anonlitmus2019/08/03 07:06 AM
    ICL memory renaming?Travis Downs2019/08/04 02:55 PM
      ICL memory renaming?anonlitmus2019/08/04 03:20 PM
        ICL memory renaming?Travis Downs2019/08/04 04:51 PM
  ICL memory renaming?ll2019/08/03 08:37 PM
    ICL memory renaming?Montaray Jack2019/08/03 11:53 PM
      ICL memory renaming?Montaray Jack2019/08/04 12:44 AM
    ICL memory renaming?Travis Downs2019/08/04 03:00 PM
      ICL memory renaming?ll2019/08/05 06:05 AM
Reply to this Topic
Body: No Text
How do you spell purple?