By: Wilco (wilco.dijkstra.delete@this.ntlworld.com), July 21, 2022 10:02 am
Anon (no.delete@this.spam.com) on July 21, 2022 6:20 am wrote:
> Wilco (wilco.dijkstra.delete@this.ntlworld.com) on July 21, 2022 6:07 am wrote:
> > We're talking about the situation where the dataset is so large that it doesn't
> > fit in the caches. So there will be many L3 requests either way. The key question
> > is how many can be filtered out so you don't have to wait on DRAM latency.
> Unless the dataset is between 1.125MB and 1.4MB and accessed in a cyclic
> way that small L3 cache size difference won't make difference.

Nope - that's not at all how cache hierarchies work. Traditionally one wants 4-8x increase in cache size for each level to get a worthwhile performance gain from the extra level. Everybody does this for L1/L2 since it's cheap. AMD spends a huge area to get a perfect 8:1 L3/L2 ratio. Intel has 1.2:1 ratio in recent servers. Yitian does 1:1 and gets amazing results.

However Altra Max has a 1:8 ratio... Despite this, Altra Max beats EPYC by huge margins on half of the SPECINT benchmarks, so the issue is just that memory intensive benchmarks require a big L3.

> > And we know from the bandwidth test results that has little effect at the DRAM controller side
> > - Altra Max shows 40-50% higher memory bandwidth than EPYC across the full range of cores.
> >
> > So don't act as if this could make a huge difference.
> The bandwidth test is a triad test, which is a very predictable pattern, that's not the case with SpecInt.

Bingo! So you actually agree that the SPEC benchmarks use far more random accesses? Why even mention bandwidth then?

