Why does writing to non-sequential lines in L2 perform so poorly?

By: Travis (travis.downs.delete@this.gmail.com), December 24, 2017 2:09 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 22, 2017 12:20 pm wrote:
> That should be fairly easy for Travis to check - replace the L1 store
> with a store to another L2 line, and see the timing behavior remains.

The timing is more or less the same when you have two interleaved reads that both hit in L2, but not in L1 (i.e., two reads that stride over a 64 KiB with one read ~32 KiB ahead of the other). This despite the fact that there are 2x as many L2 misses in this case.

> I like this theory.
>

Sure, I can imagine that having L1 accesses breaks up the optimization where multiple L2 store hits can drain to the same line, but it's the timings that surprise me. A series of L2 store misses all to different lines takes about the same time as the same total number of stores half L2 misses interleaved with half L1 hits - in the "best case". In fact, the L1/L2 interleaving seems slightly slower than pure L2 misses. And about half the time the L1/L2 interleaving goes into "slow mode" where it slows down by 2x or more (about 7 cycles or more _per write_). Without interleaving this doesn't seem to occur.

I would have expected the rule to be something like "each L2 store miss takes 3 cycles, and each L1 hit takes 1 cycle, except that L2 store misses all to the same line can be handled more efficiently (e.g., multiple can commit on one cycle)". So interleaving some L1 stores might break up the chance for the latter optimization, but the penalty seems much later than that.

> Yes. Doing writes to the same cacheline (or whatever the store buffer entry granularity
> is - it's likely the same width as the cache access width, rather than the cacheline
> width) allows you to just merge them in the same store buffer entry with a byte mask.
> So you're right, the first case has a much bigger effective store buffer.


Based on this test and earlier tests it seems like the cache access width is 64-bytes on modern Intel. The behavior is the same no matter where the DWORD or byte write falls - even at opposite ends of the cache line. That's consistent with earlier "misaligned store throughput" tests where you could achieve maximum throughput for any non-cache-line crossing alignment, with any size stores. Older Intel was different and current AMD (Ryzen) seems to care about 16-byte crossings, at least for stores (32-byte crossings matter for loads, IIRC).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 02:44 PM
  Bridges? Wells? (NT)Micahel S2017/12/20 03:53 PM
    Bridges? Wells? (NT)Travis2017/12/20 04:46 PM
      That should say "huh"? (NT)Travis2017/12/20 04:46 PM
        That should say "huh"?Jeff S.2017/12/20 05:11 PM
          That should say "huh"?Travis2017/12/20 06:34 PM
    Bridges? Wells?Jeff S.2017/12/20 05:17 PM
      Bridges? Wells?Travis2017/12/20 06:37 PM
    Bridges, Wells - positiveMichael S2017/12/21 02:52 AM
      Bridges, Wells - positiveTravis2017/12/21 09:35 AM
        Bridges, Wells - positiveMichael S2017/12/21 10:00 AM
  Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/20 06:18 PM
    Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 06:54 PM
      Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/21 12:12 PM
        Why does writing to non-sequential lines in L2 perform so poorly?anon2017/12/22 03:29 AM
          Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/22 01:16 PM
            Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/23 08:48 PM
            Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/13 03:18 PM
              Why does writing to non-sequential lines in L2 perform so poorly?John D. McCalpin2020/06/18 12:50 PM
                Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/18 05:32 PM
                  Why does writing to non-sequential lines in L2 perform so poorly?Travis Downs2020/06/18 05:34 PM
    Why does writing to non-sequential lines in L2 perform so poorly?anon.12017/12/21 06:09 PM
      Why does writing to non-sequential lines in L2 perform so poorly?Linus Torvalds2017/12/22 01:20 PM
        Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/24 02:09 PM
  Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/20 08:52 PM
    Why does writing to non-sequential lines in L2 perform so poorly?Adrian2017/12/21 12:09 AM
      Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/21 09:23 AM
    Why does writing to non-sequential lines in L2 perform so poorly?-.-2017/12/27 03:53 AM
      Why does writing to non-sequential lines in L2 perform so poorly?-.-2017/12/27 03:53 AM
        Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/27 04:18 PM
  Why does writing to non-sequential lines in L2 perform so poorly?Etienne2017/12/21 02:36 AM
    Why does writing to non-sequential lines in L2 perform so poorly?Michael S2017/12/21 02:58 AM
      Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/21 09:26 AM
        Michael ignore my last question - saw your other reply (NT)Travis2017/12/21 09:27 AM
  Why does writing to non-sequential lines in L2 perform so poorly?Nksingg2017/12/26 06:47 AM
    Why does writing to non-sequential lines in L2 perform so poorly?David Kanter2017/12/26 11:48 AM
    Why does writing to non-sequential lines in L2 perform so poorly?Travis2017/12/27 04:33 PM
  Cannot reproduce with microcode 0xc6Travis Downs2019/02/26 04:23 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/26 09:35 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/26 10:07 PM
    Cannot reproduce with microcode 0xc6Adrian2019/02/27 05:02 AM
      Cannot reproduce with microcode 0xc6Travis Downs2019/02/27 08:25 AM
        Cannot reproduce with microcode 0xc6Adrian2019/02/28 01:16 AM
          Cannot reproduce with microcode 0xc6Travis Downs2019/03/07 06:51 PM
        Cannot reproduce with microcode 0xc6Adrian2019/02/28 09:54 AM
          Cannot reproduce with microcode 0xc6Travis Downs2019/03/24 06:34 PM
    Cannot reproduce with microcode 0xc6Travis Downs2019/02/27 03:20 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?