By: anon (spam.delete.delete@this.this.spam.com), September 21, 2018 8:08 am
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on September 20, 2018 7:27 pm wrote:
> anon (spam.delete.delete@this.this.spam.com) on September 20, 2018 1:49 am wrote:
>
> > So the replay block happens with no displacement?
> > That would mean page crossing have to be highly correlated for it to be a win.
>
> It's a bit trickier to test because a bunch of loads with no displacement
> don't cross at all, but a repeated pattern like this:
>
>
> anon (spam.delete.delete@this.this.spam.com) on September 20, 2018 1:49 am wrote:
>
> > So the replay block happens with no displacement?
> > That would mean page crossing have to be highly correlated for it to be a win.
>
> It's a bit trickier to test because a bunch of loads with no displacement
> don't cross at all, but a repeated pattern like this:
>
>
> mov rax, [rax]
> mov rax, [rax + 16]
>
>
> where the +16 load crosses, runs at 7.5 cycles. This means the replay of the crossing block must cause
> the zero-disp load to run at 5-cycles, so I think this answers your question in the affirmative.
Yes, that's exactly what I wanted to know.
So the fast path is blocked even though there can't be a page crossing. Clearly the mechanism got some limitations that were deemed acceptable even though they definitely do not improve performance. Sadly that makes guessing why it behaves like it does in other situations more difficult.