microbenchmark results

By: Jörn Engel (joern.delete@this.purestorage.com), September 23, 2021 5:10 am
Room: Moderated Discussions
Jörn Engel (joern.delete@this.purestorage.com) on September 19, 2021 8:46 pm wrote:
>
> Not sure. I'll have to play around with the code a bit.

Looks like I have to eat my words. As usual, it was easier to write my own benchmark than modify yours. The results roughly match. But to mix things up I tried a copy using AVX2 (technically just AVX, I think). Here things get interesting.

If I add an offset to both src and dst there is a 2x performance difference. Numbers are cycles for 100 loops, each copying 8kB. Turboboost seems to have kicked in, making the numbers look a bit better than they should.

0: 23418
1: 42909

31: 42798
32: 21552
33: 43128

63: 43146
64: 21552
65: 42657

95: 42666
96: 21507
97: 43239

127: 43233
128: 21483
129: 42795

So what happens if we keep dst aligned and only shift the src?

0: 21642
1: 26088

31: 25268
32: 19208
33: 25194

63: 25338
64: 19474
65: 25440

95: 24962
96: 19206
97: 25120

127: 25104
128: 19214
129: 25054

We get a small speedup for the aligned cases. Probably a red herring because I didn't fix the frequencies. And we get a large speedup for the unaligned cases. This CPU can do 2 reads and 1 write per cycle, so the unaligned reads have mostly been removed as a bottleneck.

Ok, now let's keep src aligned and only shift dst.

0: 24153
1: 128886

31: 127698
32: 22206
33: 120669

63: 113868
64: 20596
65: 79906

95: 80180
96: 20392
97: 62200

127: 63056
128: 20636
129: 56148

159: 55502
160: 20600
161: 46962

191: 46856
192: 20394
193: 44454

223: 44404
224: 20284
225: 39622

This is crazy. Performance for the unaligned cases is 5x higher instead of 2x. But then the unaligned performance appears to improve, with a noticeable step each time we do another aligned round. Towards the end I see the performance results I would have expected throughout.

If I copy the entire benchmark loop a few times, I get the same results back to back. So this is not a warmup-problem, the offsets between src and dst appear to matter. So finally I shifted src by 256 bytes and now I get reasonable results again.

0: 20481
1: 39663

31: 38526
32: 19767
33: 38526

63: 38523
64: 19764
65: 38529

95: 38523
96: 19767
97: 38520

127: 38523
128: 19668
129: 38517

Not sure how to explain the crazy numbers, but the CPU behaves as if src and dst were competing for the same cachelines. Modulo the offset the two were exactly 16k apart.

If someone has a good explanation, I'd love to hear it.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 03:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 10:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 11:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 11:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/17 12:51 AM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 05:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 07:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 02:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 03:58 PM
              microbenchmark resultsMichael S2021/09/19 04:46 PM
                microbenchmark source codeMichael S2021/09/19 04:58 PM
                  microbenchmark source code-.-2021/09/20 04:49 PM
                    microbenchmark source codeMichael S2021/09/21 10:17 AM
                      microbenchmark source code-.-2021/09/21 04:33 PM
                        microbenchmark source codeMichael S2021/09/21 06:05 PM
                microbenchmark resultsAnon2021/09/19 05:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 08:46 PM
                    microbenchmark resultsdmcq2021/09/20 02:19 AM
                      microbenchmark resultsMichael S2021/09/20 05:12 AM
                      microbenchmark results-.-2021/09/20 04:44 PM
                        microbenchmark resultsMichael S2021/09/21 10:23 AM
                          microbenchmark results-.-2021/09/21 04:35 PM
                            microbenchmark resultsAndrey2021/09/21 05:25 PM
                              I agree (NT)Michael S2021/09/21 06:07 PM
                              microbenchmark results-.-2021/09/22 05:56 PM
                                microbenchmark resultsMichael S2021/09/23 06:11 AM
                                  microbenchmark resultsdmcq2021/09/23 07:53 AM
                                  microbenchmark resultsAndrey2021/09/23 10:20 AM
                                microbenchmark resultsAndrey2021/09/23 10:11 AM
                                  microbenchmark results-.-2021/09/23 08:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 02:47 AM
                                      microbenchmark results-.-2021/09/24 06:00 PM
                                    microbenchmark resultsAndrey2021/09/24 08:29 AM
                                      microbenchmark resultsdmcq2021/09/24 01:05 PM
                                        microbenchmark resultsDoug S2021/09/24 02:12 PM
                                          microbenchmark results---2021/09/24 07:06 PM
                                            microbenchmark resultsDoug S2021/09/24 11:46 PM
                                              microbenchmark results---2021/09/25 09:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 02:01 AM
                                                microbenchmark resultsDoug S2021/09/26 09:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 01:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 10:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 07:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:49 PM
                                                microbenchmark resultsBrett2021/09/25 03:31 PM
                                              microbenchmark resultsdmcq2021/09/25 12:51 PM
                                                microbenchmark resultsDoug S2021/09/26 09:45 AM
                                            microbenchmark resultsRichard S2021/09/25 01:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:52 PM
                                                microbenchmark results---2021/09/25 03:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 06:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 04:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 05:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 09:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 09:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 06:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 12:12 PM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 02:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 06:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 06:50 AM
                    microbenchmark results---2021/09/20 07:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 05:10 AM
                      microbenchmark resultsMichael S2021/09/23 05:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 09:24 AM
                          microbenchmark resultsRoyi2021/09/26 04:25 PM
                      microbenchmark resultsdmcq2021/09/23 10:42 AM
                        microbenchmark results---2021/09/23 11:53 AM
                      microbenchmark resultsanon22021/09/23 02:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 01:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 02:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 05:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 06:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 05:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 08:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 11:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 12:23 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 02:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 06:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 06:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 11:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 07:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 07:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 10:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 01:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 01:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 06:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 09:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 01:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 02:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 03:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 07:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 01:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 01:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 03:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 07:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 07:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 08:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 08:12 AM
                                  Did they publish a full description?rwessel2021/09/30 09:18 AM
                                    Did they publish a full description?Michael S2021/09/30 10:24 AM
                                      Did they publish a full description?rwessel2021/09/30 10:42 AM
                                    Did they publish a full description?Adrian2021/10/01 12:22 AM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 10:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/10/01 12:27 AM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 04:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 09:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 10:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 02:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 05:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 04:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 05:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 11:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 02:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 03:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 09:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 10:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 11:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 06:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 10:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 11:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 11:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 03:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 03:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 03:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 07:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 01:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 10:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 10:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 11:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 12:41 PM
                                                              How about environments such as Java?rwessel2021/10/03 12:49 PM
                                                                How about environments such as Java?Mark Roulo2021/10/03 01:22 PM
                                                              How about environments such as Java?anon22021/10/03 07:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 05:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 05:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 06:04 AM
                                                                  How about environments such as Java?anon22021/10/04 06:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 07:31 AM
                                                                How about environments such as Java?---2021/10/04 09:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 10:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 12:14 PM
                                                                      How about environments such as Java?Doug S2021/10/04 01:20 PM
                                                                  How about environments such as Java?anon22021/10/04 02:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 04:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 05:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 11:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 12:21 PM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 04:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 10:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 01:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 01:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 05:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 11:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 02:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 03:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 04:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 05:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 09:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 02:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 11:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 11:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 01:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 05:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 05:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 08:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 10:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 05:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 07:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 08:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 11:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 04:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 06:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 06:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 09:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 10:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 10:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 08:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 09:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 11:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 01:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 11:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 07:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 01:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 05:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 04:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 10:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 06:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 09:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 05:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 10:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 12:31 PM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 01:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 10:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 02:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊