Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512

By: Adrian (a.delete@this.acm.org), August 31, 2022 10:10 am
Room: Moderated Discussions
Heikki Kultala (heikki.kult.ala.delete@this.gmail.com) on August 31, 2022 7:38 am wrote:
> Adrian (a.delete@this.acm.org) on August 31, 2022 1:25 am wrote:
> > In the list provided by AMD for contributors to enhanced performance at the same clock frequency,
> > the second place after the new front-end was occupied by load/store enhancements.
>
> .. and which has NOTHING to so with AVX-512.

There is no information allowing a conclusion that this has something to do with AVX-512 or that it has nothing to do with AVX-512.

You may guess that it has nothing to do with AVX-512, but you should present arguments to support your guess.

I have presented solid arguments for this being a change conditioned by the support of AVX-512. Keeping the Zen 3 L1 cache bandwidth would result in an unbalanced design for AVX-512 and in significantly lower performance than all Intel CPUs.

It would have been stupid for the AMD designers to implement AVX-512 support in that way, i.e. by making certain that Zen 4 would be inferior to the competition.



>
> AMD reported average 13% IPC improvement, and most of the software on the
> list used to calcualte the IPC improvemetn did not use AVX-512 at all.

Increasing the LD/ST bandwidth to two 512-bit loads per cycle plus one 512-bit store per cycle is certain to also allow increased throughput for the 256-bit loads and stores.

Therefore it is likely that Zen 4 is able to do up to three 256-bit loads per cycle instead of up to two 256-bit loads in Zen 3 and up to two 256-bit stores instead of up to one 256-bit store in Zen 3.

Such a behavior would be very similar to Golden Cove from Alder Lake. Therefore the associated increase of AVX load/store bandwidth explains easily the IPC gains from the table.




>
> > I interpret this AMD claim that in Zen 4 the load and store bandwidth between registers
> > and the L1 data cache memory has been doubled in comparison with Zen 3.
>
> That is not an interpretation. That is stupid speculation that has NOTHING to do with
> the original text where you claim to base it. IT's a VERY BAD misinterpretation.
>

I said since the beginning that this is speculation, because AMD has not provided yet any information about the Zen 4 microarchitecture.

It is OK for you to disagree, but please present some arguments for your opinion, because I do not see any.



>
> Widening the memory data paths would be very expensive, and would do absolutely NOTHING
> to improve the performance of worklaods that do not use AVX-512. And almost all opf the
> software that AMD used to clculate this 13% IPC improvemetn does NOT use AVX-512.
>
> The load/store improvement that gives part of that 13% has to be something totally unrelated to AVX-512.

As I have said above, the increased bandwidth for the AVX-512 LD/ST in the Intel CPUs also allows an increased bandwidth for AVX. It should be expected that AMD does the same thing.

This is especially expected because Zen 3 cannot use all its load/store execution units because of insufficient bandwidth to the L1 data cache.

Zen 3 is already able to do 3 loads per cycle, but it is limited to only two 256-bit loads due to a too narrow link with the cache. The same for stores, Zen 3 can already do 2 stores per cycle, but it is limited to only one 256-bit store due to the narrow link.

So they did not need to change anything in the load/store units, they needed just to double the width of the connection to the L1 data cache to improve the AVX LD/ST bandwidth and to provide enough LD/ST bandwidth for AVX-512.



>
> Also, AFAIK there are no 512-bit registers on Zen4.

Zen 4 supports AVX-512, therefore it *MUST* have 32 512-bit registers. There is no doubt about that.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 4, AVX-512 support, 2 cycle execution timeanonymous22022/08/29 05:08 PM
  Zen 4, AVX-512 support, 2 cycle execution timeFreddie2022/08/29 05:32 PM
    Zen 4, AVX-512 support, 2 cycle execution timenoko2022/08/29 11:54 PM
    Zen 4, AVX-512 support, 2 cycle execution timeIvan2022/08/30 12:00 AM
  HPC code is moving to GPUs ...Mark Roulo2022/08/29 06:26 PM
    HPC code is moving to GPUs ...Adrian2022/08/30 01:12 AM
      HPC code is moving to GPUs ...me2022/08/30 08:17 AM
        HPC code is moving to GPUs ...Adrian2022/08/30 10:23 AM
          HPC code is moving to GPUs ...me2022/08/30 12:06 PM
            HPC code is moving to GPUs ...Anon2022/08/30 12:34 PM
              HPC code is moving to GPUs ...me2022/08/30 04:23 PM
          HPC code is moving to GPUs ...Björn Ragnar Björnsson2022/08/30 01:17 PM
  Zen 4, AVX-512 support, 2 cycle execution timeAdrian2022/08/30 12:45 AM
    Zen 4, AVX-512 support, 2 cycle execution timeMarcus2022/08/30 10:34 AM
  Zen 4 LD/ST enhancementsAdrian2022/08/31 01:25 AM
    Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Heikki Kultala2022/08/31 07:38 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Marcus2022/08/31 08:55 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:30 AM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Ivan2022/09/01 02:21 AM
            The result is for 2-socket system, not single processorHeikki Kultala2022/09/01 08:31 AM
      Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 10:10 AM
        Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Anon2022/08/31 02:24 PM
          Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512noko2022/08/31 03:21 PM
            Zen 4 LD/ST enhancements that contribute to the IPC imprvement have nothing to do with AVX-512Adrian2022/08/31 11:58 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊