microbenchmark results

By: dmcq (dmcq.delete@this.fano.co.uk), September 23, 2021 9:42 am
Room: Moderated Discussions
Jörn Engel (joern.delete@this.purestorage.com) on September 23, 2021 5:10 am wrote:
> Jörn Engel (joern.delete@this.purestorage.com) on September 19, 2021 8:46 pm wrote:
> >
> > Not sure. I'll have to play around with the code a bit.
>
> Looks like I have to eat my words. As usual, it was easier to write my own benchmark
> than modify yours. The results roughly match. But to mix things up I tried a copy
> using AVX2 (technically just AVX, I think). Here things get interesting.
>
> If I add an offset to both src and dst there is a 2x performance difference.
> Numbers are cycles for 100 loops, each copying 8kB. Turboboost seems to
> have kicked in, making the numbers look a bit better than they should.
>
> 0: 23418
> 1: 42909
>
> 31: 42798
> 32: 21552
> 33: 43128
>
> 63: 43146
> 64: 21552
> 65: 42657
>
> 95: 42666
> 96: 21507
> 97: 43239
>
> 127: 43233
> 128: 21483
> 129: 42795
>
> So what happens if we keep dst aligned and only shift the src?
>
> 0: 21642
> 1: 26088
>
> 31: 25268
> 32: 19208
> 33: 25194
>
> 63: 25338
> 64: 19474
> 65: 25440
>
> 95: 24962
> 96: 19206
> 97: 25120
>
> 127: 25104
> 128: 19214
> 129: 25054
>
> We get a small speedup for the aligned cases. Probably a red herring because I didn't fix
> the frequencies. And we get a large speedup for the unaligned cases. This CPU can do 2 reads
> and 1 write per cycle, so the unaligned reads have mostly been removed as a bottleneck.
>
> Ok, now let's keep src aligned and only shift dst.
>
> 0: 24153
> 1: 128886
>
> 31: 127698
> 32: 22206
> 33: 120669
>
> 63: 113868
> 64: 20596
> 65: 79906
>
> 95: 80180
> 96: 20392
> 97: 62200
>
> 127: 63056
> 128: 20636
> 129: 56148
>
> 159: 55502
> 160: 20600
> 161: 46962
>
> 191: 46856
> 192: 20394
> 193: 44454
>
> 223: 44404
> 224: 20284
> 225: 39622
>
> This is crazy. Performance for the unaligned cases is 5x higher instead of 2x. But then the
> unaligned performance appears to improve, with a noticeable step each time we do another aligned
> round. Towards the end I see the performance results I would have expected throughout.
>
> If I copy the entire benchmark loop a few times, I get the same results back to back.
> So this is not a warmup-problem, the offsets between src and dst appear to matter.
> So finally I shifted src by 256 bytes and now I get reasonable results again.
>
> 0: 20481
> 1: 39663
>
> 31: 38526
> 32: 19767
> 33: 38526
>
> 63: 38523
> 64: 19764
> 65: 38529
>
> 95: 38523
> 96: 19767
> 97: 38520
>
> 127: 38523
> 128: 19668
> 129: 38517
>
> Not sure how to explain the crazy numbers, but the CPU behaves as if src and dst were
> competing for the same cachelines. Modulo the offset the two were exactly 16k apart.
>
> If someone has a good explanation, I'd love to hear it.

I think L1 caches would be better if they used low-discrepancy quasirandom number sequences instead, or just modulo some random number, to avoid falling prey to that sort of effect too easily - evenif they didn't use every single line of some power of two! Need to use virtual adresses though to get that working well though.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 02:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 09:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 10:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 10:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/16 11:51 PM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 04:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 06:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 01:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 02:58 PM
              microbenchmark resultsMichael S2021/09/19 03:46 PM
                microbenchmark source codeMichael S2021/09/19 03:58 PM
                  microbenchmark source code-.-2021/09/20 03:49 PM
                    microbenchmark source codeMichael S2021/09/21 09:17 AM
                      microbenchmark source code-.-2021/09/21 03:33 PM
                        microbenchmark source codeMichael S2021/09/21 05:05 PM
                microbenchmark resultsAnon2021/09/19 04:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 07:46 PM
                    microbenchmark resultsdmcq2021/09/20 01:19 AM
                      microbenchmark resultsMichael S2021/09/20 04:12 AM
                      microbenchmark results-.-2021/09/20 03:44 PM
                        microbenchmark resultsMichael S2021/09/21 09:23 AM
                          microbenchmark results-.-2021/09/21 03:35 PM
                            microbenchmark resultsAndrey2021/09/21 04:25 PM
                              I agree (NT)Michael S2021/09/21 05:07 PM
                              microbenchmark results-.-2021/09/22 04:56 PM
                                microbenchmark resultsMichael S2021/09/23 05:11 AM
                                  microbenchmark resultsdmcq2021/09/23 06:53 AM
                                  microbenchmark resultsAndrey2021/09/23 09:20 AM
                                microbenchmark resultsAndrey2021/09/23 09:11 AM
                                  microbenchmark results-.-2021/09/23 07:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 01:47 AM
                                      microbenchmark results-.-2021/09/24 05:00 PM
                                    microbenchmark resultsAndrey2021/09/24 07:29 AM
                                      microbenchmark resultsdmcq2021/09/24 12:05 PM
                                        microbenchmark resultsDoug S2021/09/24 01:12 PM
                                          microbenchmark results---2021/09/24 06:06 PM
                                            microbenchmark resultsDoug S2021/09/24 10:46 PM
                                              microbenchmark results---2021/09/25 08:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 01:01 AM
                                                microbenchmark resultsDoug S2021/09/26 08:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 12:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 09:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 06:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 11:49 AM
                                                microbenchmark resultsBrett2021/09/25 02:31 PM
                                              microbenchmark resultsdmcq2021/09/25 11:51 AM
                                                microbenchmark resultsDoug S2021/09/26 08:45 AM
                                            microbenchmark resultsRichard S2021/09/25 12:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 11:52 AM
                                                microbenchmark results---2021/09/25 02:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 05:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 03:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 04:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 08:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 08:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 05:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 11:12 AM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 01:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 05:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 05:50 AM
                    microbenchmark results---2021/09/20 06:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 04:10 AM
                      microbenchmark resultsMichael S2021/09/23 04:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 08:24 AM
                          microbenchmark resultsRoyi2021/09/26 03:25 PM
                      microbenchmark resultsdmcq2021/09/23 09:42 AM
                        microbenchmark results---2021/09/23 10:53 AM
                      microbenchmark resultsanon22021/09/23 01:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 12:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 01:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 04:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 05:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 04:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 07:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 10:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 11:23 AM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 01:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 05:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 05:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 10:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 06:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 06:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 09:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 12:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 12:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 05:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 08:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 12:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 01:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 02:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 05:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 06:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 12:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 12:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 02:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 05:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 06:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 09:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 09:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 06:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 07:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 07:12 AM
                                  Did they publish a full description?rwessel2021/09/30 08:18 AM
                                    Did they publish a full description?Michael S2021/09/30 09:24 AM
                                      Did they publish a full description?rwessel2021/09/30 09:42 AM
                                    Did they publish a full description?Adrian2021/09/30 11:22 PM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 09:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/09/30 11:27 PM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 03:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 08:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 09:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 01:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 04:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 03:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 04:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 10:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 01:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 02:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 08:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 09:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 10:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 05:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 09:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 10:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 10:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 02:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 02:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 02:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 06:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 12:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 09:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 09:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 10:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 11:41 AM
                                                              How about environments such as Java?rwessel2021/10/03 11:49 AM
                                                                How about environments such as Java?Mark Roulo2021/10/03 12:22 PM
                                                              How about environments such as Java?anon22021/10/03 06:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 04:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 04:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 05:04 AM
                                                                  How about environments such as Java?anon22021/10/04 05:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 06:31 AM
                                                                How about environments such as Java?---2021/10/04 08:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 09:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 11:14 AM
                                                                      How about environments such as Java?Doug S2021/10/04 12:20 PM
                                                                  How about environments such as Java?anon22021/10/04 01:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 03:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 04:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 10:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 11:21 AM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 03:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 09:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 12:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 12:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 04:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 10:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 01:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 02:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 03:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 04:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 08:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 01:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 10:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 10:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 12:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 04:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 04:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 07:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 09:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 04:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 06:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 07:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 10:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 03:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 05:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 05:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 08:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 09:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 09:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 07:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 08:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 10:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 12:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 06:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 09:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 12:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 04:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 03:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 09:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 05:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 08:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 04:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 09:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 11:31 AM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 12:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 09:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 01:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?