By: h (a.delete@this.p.sl), November 4, 2014 1:14 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on November 4, 2014 9:16 am wrote:
> Heikki Kultala (hkultala.delete@this.iki.fi) on November 4, 2014 5:01 am wrote:
> [snip]
> > 2) Their cache hierarchy sucks. L1I has aliasing problems, L1D's are (too) small write-through
> > caches flooding too big, too slow L2 with lots of writes, and the L2 latency is just too long.
>
> While I agree, I seem to recall reading that the large L2 was meant to avoid the need for L3
> in lower-end chips. A 256 KiB L2 backing a 16 KiB L1 (Itanium 2 used those sizes) would be too
> small for last level cache even in a lower-end implementation but perhaps large enough that
> an appropriate L3 design (that would support an optional on-chip L4) would not be simple.
>
> (I suspect that an extra cycle (or two?) of L2 latency was added for one "core" since
> the L1 data cache interface for other was on the opposite side of the "module" from
> the L2 interface and the L2 latency was uniform for both "cores". More automated layout
> and desire for rectangular tiles might have contributed to this problem.)
>
Is L2 size tightly-coupled with module design? I mean, they couldn't just shrink cache in Zambezi design?
> Heikki Kultala (hkultala.delete@this.iki.fi) on November 4, 2014 5:01 am wrote:
> [snip]
> > 2) Their cache hierarchy sucks. L1I has aliasing problems, L1D's are (too) small write-through
> > caches flooding too big, too slow L2 with lots of writes, and the L2 latency is just too long.
>
> While I agree, I seem to recall reading that the large L2 was meant to avoid the need for L3
> in lower-end chips. A 256 KiB L2 backing a 16 KiB L1 (Itanium 2 used those sizes) would be too
> small for last level cache even in a lower-end implementation but perhaps large enough that
> an appropriate L3 design (that would support an optional on-chip L4) would not be simple.
>
> (I suspect that an extra cycle (or two?) of L2 latency was added for one "core" since
> the L1 data cache interface for other was on the opposite side of the "module" from
> the L2 interface and the L2 latency was uniform for both "cores". More automated layout
> and desire for rectangular tiles might have contributed to this problem.)
>
Is L2 size tightly-coupled with module design? I mean, they couldn't just shrink cache in Zambezi design?