By: Jörn Engel (joern.delete@this.purestorage.com), November 19, 2020 4:28 am
Room: Moderated Discussions
anonymou5 (no.delete@this.spam.com) on November 18, 2020 4:37 pm wrote:
> Jörn Engel (joern.delete@this.purestorage.com) on November 18, 2020 4:04 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 1:12 pm wrote:
> > >
> > > If state (appropriately configured...) is so cheap then
> > > (to give an obvious example) why don't AMD and Intel
> > > copy Apple's monster sized caches?
> >
> > Because their pagesize is 4k, not 16k.
>
> ICL L1d size is...?
Demonstrating that adding ways is expensive? Intel went from 8way to 12way. They could have gone to 16way, but decided against it. Until Icelake they stuck with 8way since Westmere or so.
Apple apparently also decided that 8way is a good choice. But the larger pagesize means their cache is 4x larger than the equivalent Intel cache. Looks like a new architecture designed today should pick 64k pagesize, resulting in an 8way L1 cache of 512k, the size of Icelake L2. You would also get more TLB coverage.
Main drawback is fragmentation. But we have 1000x more memory today than 20 years ago, so with a 16x larger pagesize we'd still have 64x more pages. And a lot of memory goes to large consumers where fragmentation is not an issue. I obviously haven't done the experiment, but I don't expect fragmentation with 64k pages to be a big deal.
> Jörn Engel (joern.delete@this.purestorage.com) on November 18, 2020 4:04 pm wrote:
> > Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 1:12 pm wrote:
> > >
> > > If state (appropriately configured...) is so cheap then
> > > (to give an obvious example) why don't AMD and Intel
> > > copy Apple's monster sized caches?
> >
> > Because their pagesize is 4k, not 16k.
>
> ICL L1d size is...?
Demonstrating that adding ways is expensive? Intel went from 8way to 12way. They could have gone to 16way, but decided against it. Until Icelake they stuck with 8way since Westmere or so.
Apple apparently also decided that 8way is a good choice. But the larger pagesize means their cache is 4x larger than the equivalent Intel cache. Looks like a new architecture designed today should pick 64k pagesize, resulting in an 8way L1 cache of 512k, the size of Icelake L2. You would also get more TLB coverage.
Main drawback is fragmentation. But we have 1000x more memory today than 20 years ago, so with a 16x larger pagesize we'd still have 64x more pages. And a lot of memory goes to large consumers where fragmentation is not an issue. I obviously haven't done the experiment, but I don't expect fragmentation with 64k pages to be a big deal.