By: Heikki Kultala (heikki.k.ultala.delete@this.gmail.com), September 1, 2022 8:31 am
Room: Moderated Discussions
Ivan (xxx.delete@this.xxx.xxx) on September 1, 2022 2:21 am wrote:
> Adrian (a.delete@this.acm.org) on August 31, 2022 10:30 am wrote:
> > If in Zen 4 they have doubled the width of the connection to the L1 data cache, in order to match
> > the AVX-512 LD/ST throughput of all Intel CPUs, then that automatically enables also the increase
> > of the AVX throughput to three 256-bit LD/ST per cycle, and up to 2 of them can be stores.
> >
> > The associated increased AVX throughput explains the IPC increase in the legacy benchmarks.
> >
> >
> > Until AMD presents the Zen 4 microarchitecture, we cannot know for sure, but designing Zen
> > 4 to be inferior to the competition is not something that I can believe to have happened.
> >
> > The improved IPC from the presentation explained by improvement of loads and stores cannot mean
> > anything else but a wider connection to the L1 data cache, which was a bottleneck in Zen 3.
> > It cannot mean more LD/ST port as those already existing in Zen 3 cannot be fully used.
> >
> > To achieve an improvement in AVX over Zen 3, the cache link for loads must be
> > increased from 512 bit per cycle to 768 bit per cycle, while the cache link
> > for stores must be increased from 256 bit per cycle to 512 bit per cycle.
> >
> > Once the cache link is widened that much, it would be extremely stupid to not widen
> > a little more the link for loads, up to 1024 bit per cycle, to match the performance
> > of the Intel CPUs and to provide balanced LD/ST bandwidth for AVX-512.
> >
>
> https://twitter.com/yuuki_ans/status/1549256374936170497
>
> If this AMD Genoa benchmark leak is legit, then Zen4's L1D port width has been doubled.
Compared to zen3, Everything else has gone up more in these results than L1D bandhwidth, and DRAM results here are simply impossible for 1-socket system.
No doubling of L1D bandwidth in these results, exactly the opposite.
> Adrian (a.delete@this.acm.org) on August 31, 2022 10:30 am wrote:
> > If in Zen 4 they have doubled the width of the connection to the L1 data cache, in order to match
> > the AVX-512 LD/ST throughput of all Intel CPUs, then that automatically enables also the increase
> > of the AVX throughput to three 256-bit LD/ST per cycle, and up to 2 of them can be stores.
> >
> > The associated increased AVX throughput explains the IPC increase in the legacy benchmarks.
> >
> >
> > Until AMD presents the Zen 4 microarchitecture, we cannot know for sure, but designing Zen
> > 4 to be inferior to the competition is not something that I can believe to have happened.
> >
> > The improved IPC from the presentation explained by improvement of loads and stores cannot mean
> > anything else but a wider connection to the L1 data cache, which was a bottleneck in Zen 3.
> > It cannot mean more LD/ST port as those already existing in Zen 3 cannot be fully used.
> >
> > To achieve an improvement in AVX over Zen 3, the cache link for loads must be
> > increased from 512 bit per cycle to 768 bit per cycle, while the cache link
> > for stores must be increased from 256 bit per cycle to 512 bit per cycle.
> >
> > Once the cache link is widened that much, it would be extremely stupid to not widen
> > a little more the link for loads, up to 1024 bit per cycle, to match the performance
> > of the Intel CPUs and to provide balanced LD/ST bandwidth for AVX-512.
> >
>
> https://twitter.com/yuuki_ans/status/1549256374936170497
>
> If this AMD Genoa benchmark leak is legit, then Zen4's L1D port width has been doubled.
Compared to zen3, Everything else has gone up more in these results than L1D bandhwidth, and DRAM results here are simply impossible for 1-socket system.
No doubling of L1D bandwidth in these results, exactly the opposite.