By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 17, 2022 9:59 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on September 16, 2022 9:44 pm wrote:
[snip]
> Most L1Ds are writeback. They also receive many partial writes (e.g., byte writes, 2B, 4B,
> etc.). ECC forces every write to trigger a full cache line read and write.
For L1 caches using ECC, the granularity is probably less than a cache line. The SRAM subarray word size is one reasonable ECC word size, but if 64-bit writes are common using a 64-bit ECC word might avoid enough reads to justify the area/power overhead. (Side thought: byte granular write enable is likely not to be supported with ECC-protected L1 caches, which might save a tiny bit of area and power.)
> In the L2, most
> writes are at full line granularity. In the L1, most writes are less than a full line.
>
> Also, ECC read+calc+write adds latency (as others noted above).
The latency does not seem that important. One would use a buffer anyway to provide recently written data, though greater latency would imply a larger buffer. Also a buffer alongside L1 could impact access latency of L1 contents.
Using L2 for the read for calculating ECC for sub-word writes effectively provides another set of access banks. L2 SRAM cells also tend to be more density optimized, so the cost of adding ECC bits would be reduced.
I suspect the tradeoffs are more complex than implied by my comments.
[snip]
> Most L1Ds are writeback. They also receive many partial writes (e.g., byte writes, 2B, 4B,
> etc.). ECC forces every write to trigger a full cache line read and write.
For L1 caches using ECC, the granularity is probably less than a cache line. The SRAM subarray word size is one reasonable ECC word size, but if 64-bit writes are common using a 64-bit ECC word might avoid enough reads to justify the area/power overhead. (Side thought: byte granular write enable is likely not to be supported with ECC-protected L1 caches, which might save a tiny bit of area and power.)
> In the L2, most
> writes are at full line granularity. In the L1, most writes are less than a full line.
>
> Also, ECC read+calc+write adds latency (as others noted above).
The latency does not seem that important. One would use a buffer anyway to provide recently written data, though greater latency would imply a larger buffer. Also a buffer alongside L1 could impact access latency of L1 contents.
Using L2 for the read for calculating ECC for sub-word writes effectively provides another set of access banks. L2 SRAM cells also tend to be more density optimized, so the cost of adding ECC bits would be reduced.
I suspect the tradeoffs are more complex than implied by my comments.