By: Heikki Kultala (hkultala.delete@this.iki.fi), November 16, 2014 11:12 pm
Room: Moderated Discussions
h (a.delete@this.p.sl) on November 4, 2014 1:14 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on November 4, 2014 9:16 am wrote:
> > Heikki Kultala (hkultala.delete@this.iki.fi) on November 4, 2014 5:01 am wrote:
> > [snip]
> > > 2) Their cache hierarchy sucks. L1I has aliasing problems, L1D's are (too) small write-through
> > > caches flooding too big, too slow L2 with lots of writes, and the L2 latency is just too long.
> >
> > While I agree, I seem to recall reading that the large L2 was meant to avoid the need for L3
> > in lower-end chips. A 256 KiB L2 backing a 16 KiB L1 (Itanium 2 used those sizes) would be too
> > small for last level cache even in a lower-end implementation but perhaps large enough that
> > an appropriate L3 design (that would support an optional on-chip L4) would not be simple.
> >
> > (I suspect that an extra cycle (or two?) of L2 latency was added for one "core" since
> > the L1 data cache interface for other was on the opposite side of the "module" from
> > the L2 interface and the L2 latency was uniform for both "cores". More automated layout
> > and desire for rectangular tiles might have contributed to this problem.)
> >
>
> Is L2 size tightly-coupled with module design? I mean, they couldn't just shrink cache in Zambezi design?
They released some tech doc paper which showed an option for 1MiB L2 cache, but chips with this cache never materialized. It had 2 cycles shorter latency than the 2MiB L2 version which did materialize.
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on November 4, 2014 9:16 am wrote:
> > Heikki Kultala (hkultala.delete@this.iki.fi) on November 4, 2014 5:01 am wrote:
> > [snip]
> > > 2) Their cache hierarchy sucks. L1I has aliasing problems, L1D's are (too) small write-through
> > > caches flooding too big, too slow L2 with lots of writes, and the L2 latency is just too long.
> >
> > While I agree, I seem to recall reading that the large L2 was meant to avoid the need for L3
> > in lower-end chips. A 256 KiB L2 backing a 16 KiB L1 (Itanium 2 used those sizes) would be too
> > small for last level cache even in a lower-end implementation but perhaps large enough that
> > an appropriate L3 design (that would support an optional on-chip L4) would not be simple.
> >
> > (I suspect that an extra cycle (or two?) of L2 latency was added for one "core" since
> > the L1 data cache interface for other was on the opposite side of the "module" from
> > the L2 interface and the L2 latency was uniform for both "cores". More automated layout
> > and desire for rectangular tiles might have contributed to this problem.)
> >
>
> Is L2 size tightly-coupled with module design? I mean, they couldn't just shrink cache in Zambezi design?
They released some tech doc paper which showed an option for 1MiB L2 cache, but chips with this cache never materialized. It had 2 cycles shorter latency than the 2MiB L2 version which did materialize.