Zen 4, AVX-512 support, 2 cycle execution time

By: Adrian (a.delete@this.acm.org), August 30, 2022 12:45 am
Room: Moderated Discussions
anonymous2 (anonymous2.delete@this.example.com) on August 29, 2022 5:08 pm wrote:
> AVX-512 (ISA details murky) on Zen 4 but 2 cycles vs 1 on Intel so only 256b internally.
>
> Small win for those who want the ISA, but from a performance perspective limited value?
>


We do not know yet if there is any 2-cycle execution time for AVX-512.

What we know is that Zen 4 has the same execution resources as Zen 3, and that most changes have been done only in the frontend. Some unspecified changes have been done also for load/store.

For most wide operations, Zen 3 has either four 256-bit pipelines or two 256-bit pipelines.

It is possible to implement a 512-bit operation using 2 cycles of the same pipeline. In that case, a 512-bit operation can be initiated every other cycle in the same pipeline, while a 256-bit operation can be initiated every cycle in the same pipeline.

It is also possible to implement a 512-bit operation by using simultaneously two 256-bit pipelines. In that case, when executing 512-bit instructions, only either 1 or 2 instructions can be initiated per cycle, instead of either 2 or 4 instructions per cycle, as possible for 256-bit instructions.


The latter variant seems a more likely implementation. That is also how Intel does this.

So, I do not believe that for 512-bit instructions Zen 4 offers either two (for FMA or MUL) or four 512-bit pipelines with 2-cycle throughput. I believe that it offers either one (for FMA or MUL) or two (for simple AVX-512 instructions) 512-bit pipelines with 1-cycle throughput, like Intel in their non-server CPUs.



< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 4, AVX-512 support, 2 cycle execution timeanonymous22022/08/29 05:08 PM
  Zen 4, AVX-512 support, 2 cycle execution timeFreddie2022/08/29 05:32 PM
    Zen 4, AVX-512 support, 2 cycle execution timenoko2022/08/29 11:54 PM
    Zen 4, AVX-512 support, 2 cycle execution timeIvan2022/08/30 12:00 AM
  HPC code is moving to GPUs ...Mark Roulo2022/08/29 06:26 PM
    HPC code is moving to GPUs ...Adrian2022/08/30 01:12 AM
      HPC code is moving to GPUs ...me2022/08/30 08:17 AM
        HPC code is moving to GPUs ...Adrian2022/08/30 10:23 AM
          HPC code is moving to GPUs ...me2022/08/30 12:06 PM
            HPC code is moving to GPUs ...Anon2022/08/30 12:34 PM
              HPC code is moving to GPUs ...me2022/08/30 04:23 PM
          HPC code is moving to GPUs ...Björn Ragnar Björnsson2022/08/30 01:17 PM
  Zen 4, AVX-512 support, 2 cycle execution timeAdrian2022/08/30 12:45 AM
    Zen 4, AVX-512 support, 2 cycle execution timeMarcus2022/08/30 10:34 AM
  Zen 4 LD/ST enhancementsAdrian2022/08/31 01:25 AM
    Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Heikki Kultala2022/08/31 07:38 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Marcus2022/08/31 08:55 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:30 AM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Ivan2022/09/01 02:21 AM
            The result is for 2-socket system, not single processorHeikki Kultala2022/09/01 08:31 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:10 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Anon2022/08/31 02:24 PM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512noko2022/08/31 03:21 PM
            Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 11:58 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊