microbenchmark results

By: Anon (no.delete@this.spam.com), September 19, 2021 5:32 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on September 19, 2021 4:46 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on September 18, 2021 3:58 pm wrote:
> > Jörn Engel (joern.delete@this.purestorage.com) on September 18, 2021 2:01 pm wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on September 17, 2021 7:48 am wrote:
> > > > Jörn Engel (joern.delete@this.purestorage.com) on September 17, 2021 5:42 am wrote:
> > > > >
> > > > > It won't. Unaligned access is a solved problem on any CPU
> > > > > that cares about performance. On Intel the difference
> > > > > between vmovdqu and vmovdqa on aligned data is zero - both
> > > > > instructions are equally fast. vmovdqu on unaligned
> > > > > data is maybe 10% slower than on aligned data, not a big deal either.
> > > >
> > > > 256-bit form is only 10% slower for L1D hit? May be, taken individually. But in the tight
> > > > loop, like in memcpy or integer variant of Stream Add, I'd expect it to be ~1.5x slower.
> > >
> > > Care to test your expectation? I tend to trust empirical results more than human expectations.
> > >
> > > Independent reproduction of my results:
> > > https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/
> >
> > For 32-bit and 64-bit data elements I'd also expect small penalty. On today's CPU
> > for 32-bit elements I'd expect *less* than 10%. Not so for 256-bit elements.
> >
> > As to doing my own microbenchmark, may be, tomorrow.
> > It's a night here now.
> >
>
> So, I measured time, in microsecond, of summation of 8,000,000 L1D-resident 64-bit numbers
> (16,000 B buffer, summation repeated 4,000 times) at different alignments and using different
> access/arithmetic width. CPU - Skylake Client (Xeon E-2176G) downclocked to 4.25 GHz.
>
> Here are results:
> 8-byte (64b) accesses:
> 0 1064
> 1 1170
> 2 1170
> 3 1170
> 4 1170
> 5 1170
> 6 1170
> 7 1170
> 8 1071
> 9 1171
> 10 1171
> 11 1171
> 12 1171
> 13 1171
> 14 1171
> 15 1171
> 16 1067
> 17 1170
> 18 1170
> 19 1170
> 20 1170
> 21 1170
> 22 1170
> 23 1170
> 24 1065
> 25 1170
> 26 1170
> 27 1170
> 28 1170
> 29 1170
> 30 1170
> 31 1170
>
> 16-byte (128b) accesses:
> 0 483
> 1 701
> 2 701
> 3 701
> 4 701
> 5 701
> 6 701
> 7 701
> 8 701
> 9 701
> 10 701
> 11 702
> 12 701
> 13 701
> 14 701
> 15 701
> 16 483
> 17 702
> 18 702
> 19 702
> 20 702
> 21 702
> 22 701
> 23 702
> 24 701
> 25 702
> 26 702
> 27 702
> 28 702
> 29 701
> 30 702
> 31 701
>
>
> 32-byte (256b) accesses:
> 0 256
> 1 468
> 2 468
> 3 468
> 4 468
> 5 468
> 6 468
> 7 468
> 8 468
> 9 468
> 10 468
> 11 468
> 12 468
> 13 468
> 14 468
> 15 468
> 16 468
> 17 468
> 18 468
> 19 468
> 20 468
> 21 468
> 22 468
> 23 468
> 24 468
> 25 468
> 26 468
> 27 468
> 28 468
> 29 468
> 30 468
> 31 468
>
> Misalignment penalty [of streaming add):
> 8-byte - 1.10x
> 16-byte - 1.45x
> 32-byte - 1.83x
>
> So, on this particular CPU the penalty is even bigger than what I expected.
> Quite possibly, on SKX with 512b accesses the penalty would be
> over 2x. Unfortunately, right now I have no access to SKX.
>
>

I think you should point out when the access cross a cache line or not.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 03:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 10:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 11:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 11:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/17 12:51 AM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 05:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 07:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 02:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 03:58 PM
              microbenchmark resultsMichael S2021/09/19 04:46 PM
                microbenchmark source codeMichael S2021/09/19 04:58 PM
                  microbenchmark source code-.-2021/09/20 04:49 PM
                    microbenchmark source codeMichael S2021/09/21 10:17 AM
                      microbenchmark source code-.-2021/09/21 04:33 PM
                        microbenchmark source codeMichael S2021/09/21 06:05 PM
                microbenchmark resultsAnon2021/09/19 05:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 08:46 PM
                    microbenchmark resultsdmcq2021/09/20 02:19 AM
                      microbenchmark resultsMichael S2021/09/20 05:12 AM
                      microbenchmark results-.-2021/09/20 04:44 PM
                        microbenchmark resultsMichael S2021/09/21 10:23 AM
                          microbenchmark results-.-2021/09/21 04:35 PM
                            microbenchmark resultsAndrey2021/09/21 05:25 PM
                              I agree (NT)Michael S2021/09/21 06:07 PM
                              microbenchmark results-.-2021/09/22 05:56 PM
                                microbenchmark resultsMichael S2021/09/23 06:11 AM
                                  microbenchmark resultsdmcq2021/09/23 07:53 AM
                                  microbenchmark resultsAndrey2021/09/23 10:20 AM
                                microbenchmark resultsAndrey2021/09/23 10:11 AM
                                  microbenchmark results-.-2021/09/23 08:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 02:47 AM
                                      microbenchmark results-.-2021/09/24 06:00 PM
                                    microbenchmark resultsAndrey2021/09/24 08:29 AM
                                      microbenchmark resultsdmcq2021/09/24 01:05 PM
                                        microbenchmark resultsDoug S2021/09/24 02:12 PM
                                          microbenchmark results---2021/09/24 07:06 PM
                                            microbenchmark resultsDoug S2021/09/24 11:46 PM
                                              microbenchmark results---2021/09/25 09:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 02:01 AM
                                                microbenchmark resultsDoug S2021/09/26 09:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 01:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 10:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 07:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:49 PM
                                                microbenchmark resultsBrett2021/09/25 03:31 PM
                                              microbenchmark resultsdmcq2021/09/25 12:51 PM
                                                microbenchmark resultsDoug S2021/09/26 09:45 AM
                                            microbenchmark resultsRichard S2021/09/25 01:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:52 PM
                                                microbenchmark results---2021/09/25 03:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 06:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 04:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 05:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 09:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 09:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 06:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 12:12 PM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 02:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 06:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 06:50 AM
                    microbenchmark results---2021/09/20 07:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 05:10 AM
                      microbenchmark resultsMichael S2021/09/23 05:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 09:24 AM
                          microbenchmark resultsRoyi2021/09/26 04:25 PM
                      microbenchmark resultsdmcq2021/09/23 10:42 AM
                        microbenchmark results---2021/09/23 11:53 AM
                      microbenchmark resultsanon22021/09/23 02:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 01:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 02:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 05:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 06:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 05:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 08:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 11:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 12:23 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 02:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 06:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 06:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 11:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 07:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 07:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 10:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 01:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 01:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 06:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 09:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 01:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 02:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 03:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 07:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 01:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 01:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 03:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 07:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 07:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 08:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 08:12 AM
                                  Did they publish a full description?rwessel2021/09/30 09:18 AM
                                    Did they publish a full description?Michael S2021/09/30 10:24 AM
                                      Did they publish a full description?rwessel2021/09/30 10:42 AM
                                    Did they publish a full description?Adrian2021/10/01 12:22 AM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 10:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/10/01 12:27 AM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 04:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 09:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 10:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 02:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 05:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 04:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 05:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 11:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 02:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 03:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 09:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 10:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 11:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 06:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 10:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 11:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 11:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 03:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 03:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 03:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 07:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 01:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 10:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 10:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 11:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 12:41 PM
                                                              How about environments such as Java?rwessel2021/10/03 12:49 PM
                                                                How about environments such as Java?Mark Roulo2021/10/03 01:22 PM
                                                              How about environments such as Java?anon22021/10/03 07:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 05:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 05:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 06:04 AM
                                                                  How about environments such as Java?anon22021/10/04 06:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 07:31 AM
                                                                How about environments such as Java?---2021/10/04 09:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 10:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 12:14 PM
                                                                      How about environments such as Java?Doug S2021/10/04 01:20 PM
                                                                  How about environments such as Java?anon22021/10/04 02:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 04:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 05:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 11:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 12:21 PM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 04:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 10:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 01:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 01:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 05:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 11:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 02:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 03:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 04:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 05:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 09:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 02:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 11:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 11:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 01:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 05:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 05:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 08:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 10:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 05:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 07:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 08:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 11:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 04:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 06:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 06:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 09:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 10:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 10:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 08:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 09:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 11:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 01:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 11:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 07:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 01:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 05:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 04:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 10:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 06:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 09:47 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?