Help a CPU guy understand GPU ALUs (scalar vs vector)

By: Rudolf K. (iundeaandlaenaeikn.delete@this.gmail.com), September 5, 2020 2:48 am
Room: Moderated Discussions
I was trying to learn more about the organization of modern GPUs and I am getting somewhat confused by marketing terminology. I hope that you knowledgeable people can clear things up for me. Now, I am a reasonably experienced programmer with what I'd consider a solid background in SIMD programming, having coded low-level algorithms using MMX, SSE, AVX and ARM NEON. I also did some GPGPU programming (mostly OpenCL and Apple Metal). So it would really help me if one could contrast the GPU hardware programming model with whatever GPUs do. I have read various articles (some of them excellent), but open questions remain.

What I find particularly confusing is the distinction between "scalar" (Nvidia) and "vector" (AMD RDNA) shading units. If I understand it correctly (and please correct me if I am wrong), there is nothing "scalar" about Nvidia's architecture. It seems to utilize 16-wide SIMD ALUs (basically 512-bit vector units) and when they talk about "scalar execution" they really mean that each vector lane contains data from a different work item. So they process 16 FP32 work items in parallel — just like plain old SIMD on the CPU. I imagine the CUDA instructions being something quite similar to AVX512. Nvidia calls this "SIMT", but to me it looks like plain old SIMD with flexible scatter/gather, where "thread id" is simply an integer used to index the data. And again, if I understand it correctly, the notion of "threads" in GPU programming is simply an elaborate (and confusing) way of stating that one packs multiple work items into a lanes of a single SIMD vector. So basically something like 2060 RTX (1920 shaders) is actually a 30-processor cluster, where each processor contains 4 512-bit SIMD ALUs. And it is programmed to run the same program for up to 16 FP32 work items at the same time, using scatter/gather to collect the data from memory into SIMD vectors (for graphical tasks, these work items could be pixels, for compute it could be whatever you want it to be, as long as you can give it an integer ID).

Now, a lot of tech article stress the fact that Nvidia is "scalar" and AMD is "vector", but when I look at RDNA analysis, I really don't see a principal difference. It looks to me like the execution model is identical, just that RDNA uses 32-wide ALUs instead of Nvidia's 16-wide. I also looked at Intel GPU whitepapers, it seems like their ALUs are again plain old SIMD ALUs that can operate on various data types and widths (again, just like any other CPU SIMD ISA). So they can be programmed flexibly, giving the driver implementors a certain choice how to organize the data fro graphical tasks.

I suppose my question is as follows: is my understanding (as described above) correct? Am I missing something? Are "scalar" and "vector" "shaders" the same in modern Nvidia and AMD GPUs?
 Next Post in Thread >
TopicPosted ByDate
Help a CPU guy understand GPU ALUs (scalar vs vector)Rudolf K.2020/09/05 02:48 AM
  Help a CPU guy understand GPU ALUs (scalar vs vector)Anon2020/09/05 03:13 AM
    Help a CPU guy understand GPU ALUs (scalar vs vector)Rudolf K.2020/09/05 03:27 AM
      Help a CPU guy understand GPU ALUs (scalar vs vector)Anon2020/09/05 04:47 AM
  Help a CPU guy understand GPU ALUs (scalar vs vector)Gionatan Danti2020/09/05 07:21 AM
    Help a CPU guy understand GPU ALUs (scalar vs vector)ispc2020/09/05 09:31 PM
      Help a CPU guy understand GPU ALUs (scalar vs vector)Gionatan Danti2020/09/06 06:19 AM
  Yes, SIMT = programming abstraction on top of SIMDJeff S.2020/09/05 07:51 AM
    Yes, SIMT = programming abstraction on top of SIMDRudolf K.2020/09/06 02:29 AM
    Very clear way of describing it, thanks! (NT)Travis Downs2020/09/06 06:13 PM
  Help a CPU guy understand GPU ALUs (scalar vs vector)Groo2020/09/05 10:11 AM
    This is the kind of media bullshit that creates confusion (NT)Anon2020/09/05 10:26 AM
      It's called "barrel processor"hobold2020/09/06 06:19 AM
        Not nVidia case. (NT)Anon2020/09/06 08:10 AM
        Barrel processor has NOTHING to do with thisHeikki Kultala2020/09/06 11:16 AM
          Barrel processor has NOTHING to do with thishobold2020/09/06 02:03 PM
      This is the kind of media bullshit that creates confusionGroo2020/09/06 08:23 AM
        This is the kind of media bullshit that creates confusionGroo2020/09/06 08:25 AM
        This is the kind of media bullshit that creates confusionAnon2020/09/06 09:11 AM
  Help a CPU guy understand GPU ALUs (scalar vs vector)juanrga2020/09/05 11:24 AM
  Help a CPU guy understand GPU ALUs (scalar vs vector)Brendan2020/09/05 12:09 PM
    Help a CPU guy understand GPU ALUs (scalar vs vector)Anon2020/09/05 10:07 PM
      Help a CPU guy understand GPU ALUs (scalar vs vector)Brendan2020/09/06 01:34 PM
        Help a CPU guy understand GPU ALUs (scalar vs vector)Anon2020/09/06 05:33 PM
        SIMT+reason to like AVX-512Jouni Osmala2020/09/06 08:33 PM
          SIMT+reason to like AVX-512Anon2020/09/07 09:15 AM
            reason to like AVX-512Jouni Osmala2020/09/07 10:29 AM
              reason to like AVX-512Anon2020/09/07 05:14 PM
                reason to like AVX-512Jouni Osmala2020/09/07 07:59 PM
                  reason to like AVX-512Anon2020/09/07 08:14 PM
            SIMT+reason to like AVX-512wumpus2020/09/07 06:30 PM
          SIMT+reason to like AVX-512Simon Farnsworth2020/09/07 10:26 AM
          The reply to Simon FarnsworthMichael S2020/09/07 11:43 AM
            The reply to Simon Farnsworthispc2020/09/07 09:43 PM
              ispc - first imressionMichael S2020/09/08 04:57 AM
                but for pure AoS ispc looks not bad (NT)Michael S2020/09/08 09:05 AM
                  but for pure AoS ispc looks not badEric Bron2020/09/09 01:39 AM
                    but for pure AoS ispc looks not badMichael S2020/09/09 03:25 AM
                      but for pure AoS ispc looks not badEric Bron2020/09/09 04:39 AM
                        but for pure AoS ispc looks not badMichael S2020/09/09 05:06 AM
                          but for pure AoS ispc looks not badEric Bron2020/09/09 05:49 AM
                            but for pure AoS ispc looks not badMichael S2020/09/09 06:09 AM
                              but for pure AoS ispc looks not badEric Bron2020/09/09 06:30 AM
  Help a CPU guy understand GPU ALUs (scalar vs vector)Michael S2020/09/07 04:11 AM
    Help a CPU guy understand GPU ALUs (scalar vs vector)Anon2020/09/07 09:26 AM
      Help a CPU guy understand GPU ALUs (scalar vs vector)Michael S2020/09/07 11:19 AM
        Help a CPU guy understand GPU ALUs (scalar vs vector)Jeff S.2020/09/07 12:19 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?