> If I understand it correctly then, GPU registers are stored
> in the shared on-chip memory, and the main thing that makes them "register" is the fact they can be efficiently
> addressed by instructions? Now the idea of cache hints also make sense to me...

While possible usually that's not the case, register usually have very limited cross-lane access, not only that often each lane is stored in a physically different register file (from programmers POV that's still one big register file), shared memory, on the other hand, is a place where each lane can access any part of it.
