AVX-512 mask registers

By: Travis Downs (travis.downs.delete@this.gmail.com), December 5, 2019 2:03 pm
Room: Moderated Discussions
I dug a bit into hardware size of AVX-512 mask (k) registers and put my findings in a blog post.

There are other interesting things that I didn't the the time to cover: e.g., other than the bitwise ops, all other kreg to kreg ops have 4 cycles of latency but take only a single uop (always p5). I am curious if they are implemented in a slow, dedicated ALU that takes 4 cycles, or if they are actually implemented on the p5 SIMD EU, and the latency mostly comes from transferring them there and back.

p5 is the (only) port that handles cross-domain writes to the k registers, and p5 operations like SIMD compare into k registers also use p5 and take 4 cycles of latency. Food for thought.

You can find a small amount of additional discussion on Hacker News.
TopicPosted ByDate
AVX-512 mask registersTravis Downs2019/12/05 02:03 PM
  AVX-512 mask registers-.-2019/12/06 09:25 PM
    AVX-512 mask registersAnon2019/12/07 02:29 PM
      AVX-512 mask registersanonymou52019/12/07 06:57 PM
