By: Michael S (already5chosen.delete@this.yahoo.com), December 21, 2017 1:58 am
Room: Moderated Discussions
Etienne (etienne_lorrain.delete@this.yahoo.fr) on December 21, 2017 1:36 am wrote:
> Travis (travis.downs.delete@this.gmail.com) on December 20, 2017 1:44 pm wrote:
> > What feature of the L1L2 path on x86 could cause this?
>
> Maybe it could be that you are writing one single word into an L2 cacheline, so the rest of that
> cacheline has to be fetched from L3/main memory and that takes time / cannot be completely hidden?
Sure, but it's the same regardless of the store to L1D-resident line in the middle.
And it only takes 3.5 clocks (on Skylake. On Wells and bridges it takes 6-6.5 clocks).
> Travis (travis.downs.delete@this.gmail.com) on December 20, 2017 1:44 pm wrote:
> > What feature of the L1L2 path on x86 could cause this?
>
> Maybe it could be that you are writing one single word into an L2 cacheline, so the rest of that
> cacheline has to be fetched from L3/main memory and that takes time / cannot be completely hidden?
Sure, but it's the same regardless of the store to L1D-resident line in the middle.
And it only takes 3.5 clocks (on Skylake. On Wells and bridges it takes 6-6.5 clocks).