By: anon2 (anon.delete@this.anon.com), September 16, 2022 9:04 am
Room: Moderated Discussions
anon.1 (abc.delete@this.def.com) on September 16, 2022 6:51 am wrote:
> anon2 (anon.delete@this.anon.com) on September 15, 2022 7:04 pm wrote:
> > Everybody knows the data integrity problems with parity protected write-back arrays. ECC has also
> > seemed to be a difficult problem for L1 data cache that seems like nobody has solved very well.
> >
> > The options seem to be:
> > - A write-back L1D with parity and accept lack of correction.
> > - A write-through L1D with ECC L2.
> > - Expensive L1 ECC scheme.
> >
> > A very long time ago I recall some CPUs had a bios selection between write-back and write-through
> > L1, possibly integrity was the reason. More recently Intel used a "DCU 16kB mode" option in
> > its Xeons. This changed the data cache unit from 32kB 8-way associative, to mirrored 16kB 4-way
> > halves and ECC achieved with parity finding correct copy. This seems to have gone away in favor
> > of an allegedly more robust L1D sram cell and they have no ECC on writeback L1.
> >
> > I have no issue with this. Reliability is limited by chance of more than correctable bitflips,
> > if 1 bitflip has very small chance then reliability can be fine. I'm no array designer but it
> > does seem like at some point at the very high end of reliability, having ECC would be better
> > than increasing bit reliability. But perhaps for Xeon reliability goal that is enough.
> >
> > If it's good enough for Xeon, it seems likely all other "normal" CPU designs have gone this way too.
> > Exception would be certain highly reliable or rad hard embedded, and mainframes and the like.
> >
> > What is expensive about L1 ECC which is less costly in
> > L2? Keep in mind you need write-through, so L2 has to
> > receive all the stores. Stores could be buffered and merged
> > along the way to the L2, but surely they could also
> > be buffered and merged along the way to L1 in a write-back
> > design. L1 may have a lot more misses / refills than
> > L2, but if ECC calculation is the expensive part, then ECC
> > bits should be shipped to L1. I wonder what is the
> > really costly part? Or is the answer that the benefit of write-back
> > L1 just not very large? (But that would prompt
> > the question then why others do not do a write-through design if it does not hurt performance much)
>
> AMD claims to have ECC on L1D cache. I didn't look for Xeon assuming
> you did check their documentation before stating what you did.
I don't actually know what the recent couple of generations of Xeons do.
>
> "The AMD Family 17h processor contains a 32-Kbyte, 8-way set associative L1 data cache.
> This is a write-back cache that supports two 128-bit loads and one 128-bit store per cycle.
> In addition, the L1 cache is protected from bit errors through the use of ECC. There is
> a hardware prefetcher that brings data into the L1 data cache to avoid misses. "
>
> https://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf
>
Interesting find. I wonder what strategy is used, there wasn't much else in there.
> anon2 (anon.delete@this.anon.com) on September 15, 2022 7:04 pm wrote:
> > Everybody knows the data integrity problems with parity protected write-back arrays. ECC has also
> > seemed to be a difficult problem for L1 data cache that seems like nobody has solved very well.
> >
> > The options seem to be:
> > - A write-back L1D with parity and accept lack of correction.
> > - A write-through L1D with ECC L2.
> > - Expensive L1 ECC scheme.
> >
> > A very long time ago I recall some CPUs had a bios selection between write-back and write-through
> > L1, possibly integrity was the reason. More recently Intel used a "DCU 16kB mode" option in
> > its Xeons. This changed the data cache unit from 32kB 8-way associative, to mirrored 16kB 4-way
> > halves and ECC achieved with parity finding correct copy. This seems to have gone away in favor
> > of an allegedly more robust L1D sram cell and they have no ECC on writeback L1.
> >
> > I have no issue with this. Reliability is limited by chance of more than correctable bitflips,
> > if 1 bitflip has very small chance then reliability can be fine. I'm no array designer but it
> > does seem like at some point at the very high end of reliability, having ECC would be better
> > than increasing bit reliability. But perhaps for Xeon reliability goal that is enough.
> >
> > If it's good enough for Xeon, it seems likely all other "normal" CPU designs have gone this way too.
> > Exception would be certain highly reliable or rad hard embedded, and mainframes and the like.
> >
> > What is expensive about L1 ECC which is less costly in
> > L2? Keep in mind you need write-through, so L2 has to
> > receive all the stores. Stores could be buffered and merged
> > along the way to the L2, but surely they could also
> > be buffered and merged along the way to L1 in a write-back
> > design. L1 may have a lot more misses / refills than
> > L2, but if ECC calculation is the expensive part, then ECC
> > bits should be shipped to L1. I wonder what is the
> > really costly part? Or is the answer that the benefit of write-back
> > L1 just not very large? (But that would prompt
> > the question then why others do not do a write-through design if it does not hurt performance much)
>
> AMD claims to have ECC on L1D cache. I didn't look for Xeon assuming
> you did check their documentation before stating what you did.
I don't actually know what the recent couple of generations of Xeons do.
>
> "The AMD Family 17h processor contains a 32-Kbyte, 8-way set associative L1 data cache.
> This is a write-back cache that supports two 128-bit loads and one 128-bit store per cycle.
> In addition, the L1 cache is protected from bit errors through the use of ECC. There is
> a hardware prefetcher that brings data into the L1 data cache to avoid misses. "
>
> https://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf
>
Interesting find. I wonder what strategy is used, there wasn't much else in there.