Apple GPU reverse engineering

By: K.K. (, February 16, 2021 11:48 pm
Room: Moderated Discussions
Jeff S. ( on February 16, 2021 11:19 am wrote:
> The oddest thing I caught when I skimmed this was that the exec mask was claimed to be stored in a vector
> register, not scalar/uniform. This would definitely be an odd wrinkle if correct.

I understand it that the execution mask is simply stored in a 32-bit register. "Vector/scalar" here is probably just loosely used by the author.

What I find more confusing is the description of the register file, it's something I can't quite wrap my head around. So they say that there are 128 registers per SIMD unit, and they are evenly partitioned between threads (SIMD lanes). That would mean that each thread has only access to 4 registers max — and two of them are used for special purpose anyway (the stack depth counter and link register). Also, why would the system have a link register per thread anyway — don't they all share the same IP? Is it to support function calls in divergent execution flows?
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Apple GPU reverse engineeringDavid Kanter2021/02/14 10:50 AM
  Apple GPU reverse engineeringChester2021/02/14 02:26 PM
    Apple GPU reverse engineeringJeff S.2021/02/16 11:19 AM
      Apple GPU reverse engineeringK.K.2021/02/16 11:48 PM
        Apple GPU reverse engineeringPocak2021/02/17 06:54 AM
          Apple GPU reverse engineeringK.K.2021/02/18 12:40 AM
            Apple GPU reverse engineeringAnon2021/02/18 03:23 AM
Reply to this Topic
Body: No Text
How do you spell avocado?