Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512

By: Heikki Kultala (heikki.kult.ala.delete@this.gmail.com), August 31, 2022 7:38 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on August 31, 2022 1:25 am wrote:
> anonymous2 (anonymous2.delete@this.example.com) on August 29, 2022 5:08 pm wrote:
> > AVX-512 (ISA details murky) on Zen 4 but 2 cycles vs 1 on Intel so only 256b internally.
> >
> > Small win for those who want the ISA, but from a performance perspective limited value?
> >
>
> In the list provided by AMD for contributors to enhanced performance at the same clock frequency,
> the second place after the new front-end was occupied by load/store enhancements.

.. and which has NOTHING to so with AVX-512.

AMD reported average 13% IPC improvement, and most of the software on the list used to calcualte the IPC improvemetn did not use AVX-512 at all.

> I interpret this AMD claim that in Zen 4 the load and store bandwidth between registers
> and the L1 data cache memory has been doubled in comparison with Zen 3.

That is not an interpretation. That is stupid speculation that has NOTHING to do with the original text where you claim to base it. IT's a VERY BAD misinterpretation.

> Most Intel CPUs that support AVX-512 can initiate in each clock cycle two 512-bit
> register-register operations, two 512-bit loads and one 512-bit store.
>
> Zen 3 can initiate in each cycle four 256-bit register-register operations, two 256-bit
> loads and one 256-bit store. By pairing 256-bit pipelines, Zen 4 would have been able
> to initiate in each cycle two 512-bit register-register operations and one 512-bit load,
> but one 512-bit store could have been initiated only every other cycle.
>
> That would have matched Intel in register-register operations, but would have been worse for load and store.
>
> So I assume that Zen 4 has been improved to be able to do two 512-bit loads and one 512-bit
> store per cycle.

This does not really makes sense at all.

Widening the memory data paths would be very expensive, and would do absolutely NOTHING to improve the performance of worklaods that do not use AVX-512. And almost all opf the software that AMD used to clculate this 13% IPC improvemetn does NOT use AVX-512.

The load/store improvement that gives part of that 13% has to be something totally unrelated to AVX-512.

Also, AFAIK there are no 512-bit registers on Zen4.

> Thus the throughput of Zen 4 for AVX-512 should match very closely that
> of the Intel CPUs which lack the second FMA unit, at the same clock frequency.

No. Makes no sense.

And then you post zillion lines of speculation that is based on totally false premises.

Forget it.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 4, AVX-512 support, 2 cycle execution timeanonymous22022/08/29 05:08 PM
  Zen 4, AVX-512 support, 2 cycle execution timeFreddie2022/08/29 05:32 PM
    Zen 4, AVX-512 support, 2 cycle execution timenoko2022/08/29 11:54 PM
    Zen 4, AVX-512 support, 2 cycle execution timeIvan2022/08/30 12:00 AM
  HPC code is moving to GPUs ...Mark Roulo2022/08/29 06:26 PM
    HPC code is moving to GPUs ...Adrian2022/08/30 01:12 AM
      HPC code is moving to GPUs ...me2022/08/30 08:17 AM
        HPC code is moving to GPUs ...Adrian2022/08/30 10:23 AM
          HPC code is moving to GPUs ...me2022/08/30 12:06 PM
            HPC code is moving to GPUs ...Anon2022/08/30 12:34 PM
              HPC code is moving to GPUs ...me2022/08/30 04:23 PM
          HPC code is moving to GPUs ...Björn Ragnar Björnsson2022/08/30 01:17 PM
  Zen 4, AVX-512 support, 2 cycle execution timeAdrian2022/08/30 12:45 AM
    Zen 4, AVX-512 support, 2 cycle execution timeMarcus2022/08/30 10:34 AM
  Zen 4 LD/ST enhancementsAdrian2022/08/31 01:25 AM
    Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Heikki Kultala2022/08/31 07:38 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Marcus2022/08/31 08:55 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:30 AM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Ivan2022/09/01 02:21 AM
            The result is for 2-socket system, not single processorHeikki Kultala2022/09/01 08:31 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:10 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Anon2022/08/31 02:24 PM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512noko2022/08/31 03:21 PM
            Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 11:58 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊