Why does writing to non-sequential lines in L2 perform so poorly?

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 21, 2017 12:12 pm
Room: Moderated Discussions
Travis (travis.downs.delete@this.gmail.com) on December 20, 2017 5:54 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 20, 2017 5:18 pm wrote:
> > Wild guess: it's about store ordering guarantees, and the L1 hit store being done concurrently
> > with (or perhaps even instead of) the store buffer for some efficiency reason.
>
> This part I don't understand - do you mean these stores skip the store buffer?

My guess is that on an L1 hit, the stores might update the L1 directly.

There's a couple of reasons why an L1 hit might be special for the store unit:

(a) a pure store-only load is fairly common (memset, memcpy destination) and probably does not want to populate the L1 at all. So if you have the store buffer draining into L2, you'd get nicer cache behaviour for those stores. But that obviously means that you have to do something different when the L1 is present.

(b) there's a nasty case for the "subsequent load hits _partially_ in the store buffer" load latency. Intel optimization manuals say not to do it, but it happens. When it happens, the value is read from the L1 D$ instead of being taken from the store buffer, and you have a few cycle latency. Updating the L1 directly on a store might help that case.

NOTE! Neither of these reasons are necessarily reasons to skip the store buffer. The L1 update might be in addition to the store buffer, and in fact that's what I'd expect.

So my thinking is that the behavior you see might be because

(1) the store buffer drains purely to the L2, and the L2 is the real cache coherency boundary for external cores. The store ordering is easy to maintain because the stores really drain in order (although fetching the L2 lines can obviously be entirely OoO).

(2) but to keep the L1 up-to-date, the L1 is updated separately from (and concurrently with) the store buffer if the line exists in there.

(3) but because the L1 is visible to at least HT cores, store ordering is an issue, and the L1 update has to happen in order with any stores that missed in the L1, because otherwise at least a HT core could see writes in the wrong order.

Anyway. I may be completely off, I'm just throwing this out as a possible reason for the odd timings you see. It might be interesting to test with HT on vs HT off, because I think any "L1 access order visibility" really might be limited to only the HT case, because normally I thought that Intel limited snoop to L2 and out.

I am neither a CPU designer nor do I have any insider knowledge into what Intel does. So my guess may be pure garbage. There is nothing but your odd timings behind it (and knowledge of the Intel memory ordering rules).

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 02:44 PM
  Bridges? Wells? (NT)Micahel S2017/12/20 03:53 PM
    Bridges? Wells? (NT)Travis2017/12/20 04:46 PM
      That should say "huh"? (NT)Travis2017/12/20 04:46 PM
        That should say "huh"?Jeff S.2017/12/20 05:11 PM
          That should say "huh"?Travis2017/12/20 06:34 PM
    Bridges? Wells?Jeff S.2017/12/20 05:17 PM
      Bridges? Wells?Travis2017/12/20 06:37 PM
    Bridges, Wells - positiveMichael S2017/12/21 02:52 AM
      Bridges, Wells - positiveTravis2017/12/21 09:35 AM
        Bridges, Wells - positiveMichael S2017/12/21 10:00 AM
  Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/20 06:18 PM
    Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 06:54 PM
      Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/21 12:12 PM
        Why does writing to non-sequential lines in L2 perform so poorly?anon2017/12/22 03:29 AM
          Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/22 01:16 PM
            Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/23 08:48 PM
            Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/13 03:18 PM
              Why does writing to non-sequential lines in L2 perform so poorly?John D. McCalpin2020/06/18 12:50 PM
                Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/18 05:32 PM
                  Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/18 05:34 PM
    Why does writing to non-sequential lines in L2 perform so poorly?anon.12017/12/21 06:09 PM
      Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/22 01:20 PM
        Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/24 02:09 PM
  Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 08:52 PM
    Why does writing to non-sequential lines in L2 perform so poorly?Adrian2017/12/21 12:09 AM
      Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/21 09:23 AM
    Why does writing to non-sequential lines in L2 perform so poorly?-.-2017/12/27 03:53 AM
      Why does writing to non-sequential lines in L2 perform so poorly?-.-2017/12/27 03:53 AM
        Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/27 04:18 PM
  Why does writing to non-sequential lines in L2 perform so poorly?Etienne2017/12/21 02:36 AM
    Why does writing to non-sequential lines in L2 perform so poorly?Michael S2017/12/21 02:58 AM
      Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/21 09:26 AM
        Michael ignore my last question - saw your other reply (NT)Travis2017/12/21 09:27 AM
  Why does writing to non-sequential lines in L2 perform so poorly?Nksingg2017/12/26 06:47 AM
    Why does writing to non-sequential lines in L2 perform so poorly?David Kanter2017/12/26 11:48 AM
    Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/27 04:33 PM
  Cannot reproduce with microcode 0xc6Travis Downs2019/02/26 04:23 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/26 09:35 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/26 10:07 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/27 05:02 AM
      Cannot reproduce with microcode 0xc6Travis Downs2019/02/27 08:25 AM
        Cannot reproduce with microcode 0xc6Adrian2019/02/28 01:16 AM
          Cannot reproduce with microcode 0xc6Travis Downs2019/03/07 06:51 PM
        Cannot reproduce with microcode 0xc6Adrian2019/02/28 09:54 AM
          Cannot reproduce with microcode 0xc6Travis Downs2019/03/24 06:34 PM
    Cannot reproduce with microcode 0xc6Travis Downs2019/02/27 03:20 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?