By: David Kanter (dkanter.delete@this.realworldtech.com), September 18, 2022 12:29 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on September 17, 2022 10:59 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on September 16, 2022 9:44 pm wrote:
> [snip]
> > Most L1Ds are writeback. They also receive many partial writes (e.g., byte writes, 2B, 4B,
> > etc.). ECC forces every write to trigger a full cache line read and write.
>
> For L1 caches using ECC, the granularity is probably less than a cache line. The SRAM subarray word size
> is one reasonable ECC word size, but if 64-bit writes are common using a 64-bit ECC word might avoid
> enough reads to justify the area/power overhead. (Side thought: byte granular write enable is likely
> not to be supported with ECC-protected L1 caches, which might save a tiny bit of area and power.)
That's true, but sub-line ECC has a much higher area overhead. So maybe I'd state it as 'power or area, take your pick'...
> > In the L2, most
> > writes are at full line granularity. In the L1, most writes are less than a full line.
> >
> > Also, ECC read+calc+write adds latency (as others noted above).
>
> The latency does not seem that important. One would use a buffer anyway to
> provide recently written data, though greater latency would imply a larger buffer.
> Also a buffer alongside L1 could impact access latency of L1 contents.
On older AMD parts I know the ECC on the L1 was a challenge for timing and required some careful work. That was probably 45nm or older, so not sure about 7nm and beyond.
David
> David Kanter (dkanter.delete@this.realworldtech.com) on September 16, 2022 9:44 pm wrote:
> [snip]
> > Most L1Ds are writeback. They also receive many partial writes (e.g., byte writes, 2B, 4B,
> > etc.). ECC forces every write to trigger a full cache line read and write.
>
> For L1 caches using ECC, the granularity is probably less than a cache line. The SRAM subarray word size
> is one reasonable ECC word size, but if 64-bit writes are common using a 64-bit ECC word might avoid
> enough reads to justify the area/power overhead. (Side thought: byte granular write enable is likely
> not to be supported with ECC-protected L1 caches, which might save a tiny bit of area and power.)
That's true, but sub-line ECC has a much higher area overhead. So maybe I'd state it as 'power or area, take your pick'...
> > In the L2, most
> > writes are at full line granularity. In the L1, most writes are less than a full line.
> >
> > Also, ECC read+calc+write adds latency (as others noted above).
>
> The latency does not seem that important. One would use a buffer anyway to
> provide recently written data, though greater latency would imply a larger buffer.
> Also a buffer alongside L1 could impact access latency of L1 contents.
On older AMD parts I know the ECC on the L1 was a challenge for timing and required some careful work. That was probably 45nm or older, so not sure about 7nm and beyond.
David