By: Michael S (already5chosen.delete@this.yahoo.com), September 17, 2022 5:12 pm
Room: Moderated Discussions
anon2 (anon.delete@this.anon.com) on September 15, 2022 7:04 pm wrote:
> Everybody knows the data integrity problems with parity protected write-back arrays. ECC has also
> seemed to be a difficult problem for L1 data cache that seems like nobody has solved very well.
>
> The options seem to be:
> - A write-back L1D with parity and accept lack of correction.
> - A write-through L1D with ECC L2.
> - Expensive L1 ECC scheme.
>
> A very long time ago I recall some CPUs had a bios selection between write-back and write-through
> L1, possibly integrity was the reason. More recently Intel used a "DCU 16kB mode" option in
> its Xeons. This changed the data cache unit from 32kB 8-way associative, to mirrored 16kB 4-way
> halves and ECC achieved with parity finding correct copy. This seems to have gone away in favor
> of an allegedly more robust L1D sram cell and they have no ECC on writeback L1.
>
I think, Pentium-III Xeon had ECC in its L1D cache.
So did all Merom and Penryn generation chips, not just Xeons, but desktop and mobile as well.
Pentium-4 (all generations) had write-throw L1D so no need for ECC.
Intel's recent "robust L1D cells + No-ECC" strategy started from Nehalem.
As to AMD, I think they had L1D with ECC starting with K8 at very least. May be, K7 too.
Except for BD derivatives that, like P4, had write-through L1D.
> I have no issue with this. Reliability is limited by chance of more than correctable bitflips,
> if 1 bitflip has very small chance then reliability can be fine. I'm no array designer but it
> does seem like at some point at the very high end of reliability, having ECC would be better
> than increasing bit reliability. But perhaps for Xeon reliability goal that is enough.
>
> If it's good enough for Xeon, it seems likely all other "normal" CPU designs have gone this way too.
> Exception would be certain highly reliable or rad hard embedded, and mainframes and the like.
>
> What is expensive about L1 ECC which is less costly in L2? Keep in mind you need write-through, so L2 has to
> receive all the stores. Stores could be buffered and merged along the way to the L2, but surely they could also
> be buffered and merged along the way to L1 in a write-back design. L1 may have a lot more misses / refills than
> L2, but if ECC calculation is the expensive part, then ECC bits should be shipped to L1. I wonder what is the
> really costly part? Or is the answer that the benefit of write-back L1 just not very large? (But that would prompt
> the question then why others do not do a write-through design if it does not hurt performance much)
> Everybody knows the data integrity problems with parity protected write-back arrays. ECC has also
> seemed to be a difficult problem for L1 data cache that seems like nobody has solved very well.
>
> The options seem to be:
> - A write-back L1D with parity and accept lack of correction.
> - A write-through L1D with ECC L2.
> - Expensive L1 ECC scheme.
>
> A very long time ago I recall some CPUs had a bios selection between write-back and write-through
> L1, possibly integrity was the reason. More recently Intel used a "DCU 16kB mode" option in
> its Xeons. This changed the data cache unit from 32kB 8-way associative, to mirrored 16kB 4-way
> halves and ECC achieved with parity finding correct copy. This seems to have gone away in favor
> of an allegedly more robust L1D sram cell and they have no ECC on writeback L1.
>
I think, Pentium-III Xeon had ECC in its L1D cache.
So did all Merom and Penryn generation chips, not just Xeons, but desktop and mobile as well.
Pentium-4 (all generations) had write-throw L1D so no need for ECC.
Intel's recent "robust L1D cells + No-ECC" strategy started from Nehalem.
As to AMD, I think they had L1D with ECC starting with K8 at very least. May be, K7 too.
Except for BD derivatives that, like P4, had write-through L1D.
> I have no issue with this. Reliability is limited by chance of more than correctable bitflips,
> if 1 bitflip has very small chance then reliability can be fine. I'm no array designer but it
> does seem like at some point at the very high end of reliability, having ECC would be better
> than increasing bit reliability. But perhaps for Xeon reliability goal that is enough.
>
> If it's good enough for Xeon, it seems likely all other "normal" CPU designs have gone this way too.
> Exception would be certain highly reliable or rad hard embedded, and mainframes and the like.
>
> What is expensive about L1 ECC which is less costly in L2? Keep in mind you need write-through, so L2 has to
> receive all the stores. Stores could be buffered and merged along the way to the L2, but surely they could also
> be buffered and merged along the way to L1 in a write-back design. L1 may have a lot more misses / refills than
> L2, but if ECC calculation is the expensive part, then ECC bits should be shipped to L1. I wonder what is the
> really costly part? Or is the answer that the benefit of write-back L1 just not very large? (But that would prompt
> the question then why others do not do a write-through design if it does not hurt performance much)