Apple GPU reverse engineering

By: K.K. (, February 18, 2021 12:40 am
Room: Moderated Discussions
Pocak ( on February 17, 2021 6:54 am wrote:
> From the doc:
> > General purpose registers each store one 32-bit value per thread.
> They mean a total of 1024 bits per register.
> The document doesn't say how many physical registers there are, only that one SIMD-group
> may use at most 128. There could be more registers per SIMD unit. It also doesn't say
> the registers are evenly partitioned — if it's like other GPUs, there could be SIMD-groups
> with different register needs running on the same SIMD unit simultaneously.

Thanks for clearing this up! I am a total novice when it comes to GPU internals, so it's probably my preconceived notion of CPU registers that misled me here. If I understand it correctly then, GPU registers are stored in the shared on-chip memory, and the main thing that makes them "register" is the fact they can be efficiently addressed by instructions? Now the idea of cache hints also make sense to me...

Apple states that Metal shaders can reserve up to 32KB of shared threadgroup memory. If one assumes that registers are stored in the same memory buffer and that Apple has to make provisions for it, it would put the total available shared memory per GPU cluster somewhere between 48KB and 64KB...
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Apple GPU reverse engineeringDavid Kanter2021/02/14 10:50 AM
  Apple GPU reverse engineeringChester2021/02/14 02:26 PM
    Apple GPU reverse engineeringJeff S.2021/02/16 11:19 AM
      Apple GPU reverse engineeringK.K.2021/02/16 11:48 PM
        Apple GPU reverse engineeringPocak2021/02/17 06:54 AM
          Apple GPU reverse engineeringK.K.2021/02/18 12:40 AM
            Apple GPU reverse engineeringAnon2021/02/18 03:23 AM
Reply to this Topic
Body: No Text
How do you spell avocado?