SIMD syntax

By: Jukka Larja (roskakori2006.delete@this.gmail.com), November 1, 2020 10:11 am
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on November 1, 2020 7:22 am wrote:
> Jukka Larja (roskakori2006.delete@this.gmail.com) on October 31, 2020 8:14 am wrote:
> [...]
> > I'm not sure what std::simd would include, but at least on a quick glance I don't see GCC's extensions
> > would matter at all for us. It's not like writing a wrapper of our own (for (S)SSE(2/3) and Neon)
> > was all that difficult or took a lot of time.
>
> Template metaprogramming is not rocket surgery. It's not trivial either, though. And
> you have to keep validating that new compiler versions don't break performance.
>
> I was thinking of the contrast illustrated, for example, here between a plain
> formula on the one hand, and a mess of nested intrinsics on the other hand:
>
> https://nullprogram.com/blog/2015/07/10/
>
> The GCC extension allows writing a plain formula, too, when targeting a SIMD backend.

Mandelbrot example is such a toy that I'm not even sure what it should illustrate compared to our game code. I don't see such nice loops there.

> > The problem is that we don't have much code that
> > could make use of such abstraction (actually, we mostly use four wide float vectors as direct
> > replacement of three wide float vectors. We can't even make use of the last item).
> >
> I think the terminology "vector" for SIMD parallelism is misleading people at large
> into thinking in terms of vector math. There is some overlap, but as you noticed,
> the mathy vectors aren't generally too useful for extracting SIMD parallelism.
>
> Usually the problem needs to be "rotated by 90 degrees", i.e. in this case you'd have a SIMD
> vector with elements x1, x2, x3, x4, another SIMD vector with elements y1, y2, y3, y4, and
> so on. Effectively working with four mathy 3-vectors inside three SIMD 4-vectors.

The problem is that the particular algorithm iterates over Vec3s one at a time and uses the result of previous iteration for the next step. It's not obvious how it could be written in structure of arrays format (actually, it's not obvious at all what the algorithm does. The person who wrote it left us about five years ago. I've fixed few bugs in the algorithm, but those were so limited in scope that I didn't need to understand the whole).

It could be just that I'm not experienced enough to see something that is obvious to those who commonly write in SIMD. You say that I can "write the plain formula" with those GCC extensions, but I don't really see any formula I could write. Most of the code is couple of calculations with individual Vec3s or floats and a branch. There aren't hundreds of items that need the same work done on all of them, but instead few or dozen items each needing a slight variation of work to be done (actually, most will usually need no work done at all and will be dropped from "active object list" so to speak).

> > I'm sure there are some places where rethinking data structures or algorithms could
> > allow making use of SIMD, but if we were to go through the trouble, we'd consider
> > using GPU first. After that consideration, there's practically nothing left.
> >
> Plus, GPU programming interfaces don't mislead you as much, as they
> don't really make their SIMD width visible to the programmer.

Yeah, but not knowing about the width usually leads to very bad code (at least that's my understanding based on what I hear at work).

> BTW, I found it useful to think about re-arranging data structures not in terms of vectors,
> but in terms of cache friendliness. Locality, yes, but also things like: invariant data
> should not be interleaved with changing data (to save write back bandwidth).
>
> That kind of data centric programming enables optimizations which also help ordinary
> scalar code. And they are a large step towards SIMD, where the programming model forces
> locality by treating a number of subsequent data items as one indivisible unit.
>
> Doing optimizations for caches "naturally" tends to prefer structure of arrays format.

Our graphics programmer loves to talk about this too, but outside graphics it hardly ever matters. Being data oriented is great, if one has enough data. It's much less fun when you have couple of thingies that need to A and B, except that next week they also need to do C, if third type of object is present. Being able to make changes fast is much preferred to being ten times faster.

A typical performance problem for us is that player triggers something in game and then dozen subsystems needs to get going during a single frame. Typical solution is not to try to optimize all those subsystems so they can be launched at the same time, but to make it so they are rather launched over longer time span, and maybe doing most of the work in worker threads instead of everyone doing their thing back-to-back in main thread.

You probably think that game code is something special. For 95-99 % it's not. When I look at the code written for backing up files, the code that runs our build system or the C++/Angular code running the build system monitoring, I find it just as good for SIMD optimization as most of our game code. There's some lack of floating point, but mostly it's pretty similar.

Sure, there's the whole rendering thing, but that's just a small part of the whole. It's also done mostly on GPU.

Physics do make some use of SIMD, but I don't really have much idea what. You can take a look at PhysX ub Github, if you wish. For us, the bottle neck with physics is usually in spawning new objects, or running various hacks for individual objects to make the physics behave, not running the simulation itself. I don't think those parts of PhysX code can benefit much from SIMD.

-JLarja
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Expiry of x86-64 patentsBeastian2019/04/19 08:05 AM
  Expiry of x86-64 patentsGian-Carlo Pascutto2019/04/19 08:46 AM
    Expiry of x86-64 patentsBeastian2019/04/19 09:06 AM
    Expiry of x86-64 patentsJukka Larja2019/04/19 09:44 AM
      Expiry of x86-64 patentsGian-Carlo Pascutto2019/04/19 10:12 AM
        Expiry of x86-64 patentsJukka Larja2019/04/19 11:41 AM
          Expiry of x86-64 patentsRobert Williams2019/04/19 12:18 PM
          Expiry of x86-64 patentsGian-Carlo Pascutto2019/04/19 01:35 PM
          Expiry of x86-64 patentsIntelUser20002020/10/30 01:17 AM
            Expiry of x86-64 patentsJukka Larja2020/10/30 06:49 AM
              Expiry of x86-64 patentsme2020/10/30 08:47 AM
                Expiry of x86-64 patentsJukka Larja2020/10/30 08:52 AM
                  Expiry of x86-64 patentsMark Roulo2020/10/30 09:21 AM
                    Expiry of x86-64 patentsJukka Larja2020/10/30 10:29 AM
                      Expiry of x86-64 patentsMark Roulo2020/10/30 10:42 AM
                        Expiry of x86-64 patentsJukka Larja2020/10/30 08:04 PM
                          SIMD syntaxhobold2020/10/31 05:54 AM
                            SIMD syntaxJukka Larja2020/10/31 08:14 AM
                              SIMD syntaxhobold2020/11/01 07:22 AM
                                SIMD syntaxJukka Larja2020/11/01 10:11 AM
                                  SIMD syntaxhobold2020/11/02 04:33 AM
                          Expiry of x86-64 patentsme2020/10/31 02:01 PM
                            Expiry of x86-64 patentsJukka Larja2020/10/31 08:23 PM
                              Expiry of x86-64 patentsFoo_2020/11/01 03:48 AM
                                Expiry of x86-64 patentsJukka Larja2020/11/01 06:01 AM
                      Expiry of x86-64 patentsAdrian2020/10/30 11:02 AM
                        Expiry of x86-64 patentsBigos2020/10/30 12:20 PM
      Expiry of x86-64 patentsGeoff Langdale2019/04/19 01:52 PM
        Expiry of x86-64 patentsJukka Larja2019/04/19 08:38 PM
      Expiry of x86-64 patentsYuhong Bao2019/04/20 01:35 PM
  Expiry of x86-64 patentsDoug S2019/04/19 09:40 AM
    Expiry of x86-64 patentsBeastian2019/04/19 10:10 AM
      Expiry of x86-64 patentsRobert Williams2019/04/20 07:15 AM
        Expiry of x86-64 patentsRobert Williams2020/10/28 05:42 AM
  Expiry of x86-64 patentsanyone2019/04/20 06:11 AM
    Expiry of x86-64 patentsGroo2019/04/20 06:29 AM
      Expiry of x86-64 patentswumpus2019/04/20 07:32 AM
      Expiry of x86-64 patentsblaine2020/10/30 11:03 AM
        Expiry of x86-64 patentsDavid Kanter2020/10/30 07:59 PM
  Intel vs AMD patentsYuhong Bao2019/04/20 01:32 PM
    Intel vs AMD patentsBeastian2019/04/20 02:35 PM
  Expiry of x86-64 patentsTravis Downs2019/04/20 06:24 PM
    Expiry of x86-64 patentsnone2019/04/21 06:36 AM
      Expiry of x86-64 patentssomebody2019/11/27 09:44 AM
      Expiry of x86-64 patentsAnon32019/11/27 04:16 PM
        Expiry of x86-64 patentsTravis Downs2019/11/27 05:17 PM
      Expiry of x86-64 patentsMontaray Jack2019/11/27 11:03 PM
        Expiry of x86-64 patentsnone2019/11/28 12:57 AM
          Expiry of x86-64 patentsdmcq2019/11/28 10:20 AM
            Expiry of x86-64 patentsMontaray Jack2019/11/29 04:00 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?