SVE alignment with non power-of-2 widths

By: dmcq (dmcq.delete@this.fano.co.uk), September 28, 2021 2:29 pm
Room: Moderated Discussions
Andrey (andrey.semashev.delete@this.gmail.com) on September 28, 2021 12:12 pm wrote:
> Jukka Larja (roskakori2006.delete@this.gmail.com) on September 28, 2021 6:37 am wrote:
> > -.- (blarg.delete@this.mailinator.com) on September 27, 2021 9:06 pm wrote:
> > > Kevin G (kevin.delete@this.cubitdesigns.com) on September 27, 2021 9:46 am wrote:
> > > > Three and six element vectors are relatively common for 3D work. Early in the history of SIMD when it made
> > > > sense to have a CPU code path for this, the code would simply
> > > > round out to the next largest power of 2 vector
> > > > size and live with the 25% inefficiency. (Three 32 floats would run on a 128 bit SIMD unit etc.) Nowadays
> > > > such bulk work is done on GPUs where vector elements are decomposed and that 25% inefficiency is recovered.
> > >
> > > I don't know the specifics of your example, but it sounds symptomatic of poor code design
> > > or memory layout. It sounds a lot like someone who took a scalar AoS design, then threw
> > > the x, y and z coordinates horizontally into a single vector, and patted themselves on the
> > > back for adopting SIMD. In reality, they probably should re-layout their data structures
> > > to use SoA, where a vector width being a multiple of 3/6 provides no intrinsic benefit.
> >
> > I've never actually seen this done in any game, but I've seen plenty of libs offering such
> > easy to use option to get some extra performance from 4-wide SIMD. I've created such lib
> > myself and got decent performance improvement for one use case (something like twice the
> > performance) and a regression for another (probably due to 25 % cache waste).
> >
> > The thing is, going from AoS to SoA is often completely unrealistic. For myself, it's often about having
> > one or couple thingies that require some vector math to update. It's a chain of Vec3 operations, with
> > plenty of ifs sprinkled around, repeated couple of times. Not dozens or hundreds of times, as would
> > be needed to make SoA model make sense (also, this case is obviously no good for GPGPU).
>
> As I understand, the uses of SIMD where the amount of data is relatively small is not the target usage for
> SVE, which is geared towards processing large amounts of data in a loop. And with large amounts of data,
> you do want to convert to SoA, even if on the fly during the loop, because data density is key here.

I think it is more for intermediate sizes, and the long lengths will be tackled by the streaming form that they want to introduce with the scalable matrix extension.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 03:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 10:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 11:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 11:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/17 12:51 AM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 05:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 07:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 02:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 03:58 PM
              microbenchmark resultsMichael S2021/09/19 04:46 PM
                microbenchmark source codeMichael S2021/09/19 04:58 PM
                  microbenchmark source code-.-2021/09/20 04:49 PM
                    microbenchmark source codeMichael S2021/09/21 10:17 AM
                      microbenchmark source code-.-2021/09/21 04:33 PM
                        microbenchmark source codeMichael S2021/09/21 06:05 PM
                microbenchmark resultsAnon2021/09/19 05:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 08:46 PM
                    microbenchmark resultsdmcq2021/09/20 02:19 AM
                      microbenchmark resultsMichael S2021/09/20 05:12 AM
                      microbenchmark results-.-2021/09/20 04:44 PM
                        microbenchmark resultsMichael S2021/09/21 10:23 AM
                          microbenchmark results-.-2021/09/21 04:35 PM
                            microbenchmark resultsAndrey2021/09/21 05:25 PM
                              I agree (NT)Michael S2021/09/21 06:07 PM
                              microbenchmark results-.-2021/09/22 05:56 PM
                                microbenchmark resultsMichael S2021/09/23 06:11 AM
                                  microbenchmark resultsdmcq2021/09/23 07:53 AM
                                  microbenchmark resultsAndrey2021/09/23 10:20 AM
                                microbenchmark resultsAndrey2021/09/23 10:11 AM
                                  microbenchmark results-.-2021/09/23 08:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 02:47 AM
                                      microbenchmark results-.-2021/09/24 06:00 PM
                                    microbenchmark resultsAndrey2021/09/24 08:29 AM
                                      microbenchmark resultsdmcq2021/09/24 01:05 PM
                                        microbenchmark resultsDoug S2021/09/24 02:12 PM
                                          microbenchmark results---2021/09/24 07:06 PM
                                            microbenchmark resultsDoug S2021/09/24 11:46 PM
                                              microbenchmark results---2021/09/25 09:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 02:01 AM
                                                microbenchmark resultsDoug S2021/09/26 09:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 01:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 10:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 07:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:49 PM
                                                microbenchmark resultsBrett2021/09/25 03:31 PM
                                              microbenchmark resultsdmcq2021/09/25 12:51 PM
                                                microbenchmark resultsDoug S2021/09/26 09:45 AM
                                            microbenchmark resultsRichard S2021/09/25 01:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:52 PM
                                                microbenchmark results---2021/09/25 03:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 06:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 04:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 05:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 09:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 09:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 06:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 12:12 PM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 02:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 06:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 06:50 AM
                    microbenchmark results---2021/09/20 07:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 05:10 AM
                      microbenchmark resultsMichael S2021/09/23 05:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 09:24 AM
                          microbenchmark resultsRoyi2021/09/26 04:25 PM
                      microbenchmark resultsdmcq2021/09/23 10:42 AM
                        microbenchmark results---2021/09/23 11:53 AM
                      microbenchmark resultsanon22021/09/23 02:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 01:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 02:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 05:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 06:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 05:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 08:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 11:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 12:23 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 02:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 06:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 06:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 11:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 07:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 07:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 10:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 01:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 01:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 06:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 09:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 01:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 02:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 03:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 07:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 01:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 01:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 03:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 07:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 07:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 08:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 08:12 AM
                                  Did they publish a full description?rwessel2021/09/30 09:18 AM
                                    Did they publish a full description?Michael S2021/09/30 10:24 AM
                                      Did they publish a full description?rwessel2021/09/30 10:42 AM
                                    Did they publish a full description?Adrian2021/10/01 12:22 AM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 10:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/10/01 12:27 AM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 04:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 09:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 10:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 02:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 05:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 04:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 05:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 11:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 02:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 03:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 09:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 10:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 11:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 06:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 10:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 11:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 11:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 03:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 03:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 03:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 07:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 01:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 10:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 10:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 11:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 12:41 PM
                                                              How about environments such as Java?rwessel2021/10/03 12:49 PM
                                                                How about environments such as Java?Mark Roulo2021/10/03 01:22 PM
                                                              How about environments such as Java?anon22021/10/03 07:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 05:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 05:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 06:04 AM
                                                                  How about environments such as Java?anon22021/10/04 06:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 07:31 AM
                                                                How about environments such as Java?---2021/10/04 09:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 10:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 12:14 PM
                                                                      How about environments such as Java?Doug S2021/10/04 01:20 PM
                                                                  How about environments such as Java?anon22021/10/04 02:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 04:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 05:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 11:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 12:21 PM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 04:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 10:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 01:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 01:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 05:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 11:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 02:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 03:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 04:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 05:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 09:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 02:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 11:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 11:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 01:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 05:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 05:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 08:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 10:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 05:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 07:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 08:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 11:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 04:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 06:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 06:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 09:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 10:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 10:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 08:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 09:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 11:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 01:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 11:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 07:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 01:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 05:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 04:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 10:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 06:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 09:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 05:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 10:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 12:31 PM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 01:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 10:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 02:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊