By: rwessel (rwessel.delete@this.yahoo.com), January 11, 2021 7:24 am
Room: Moderated Discussions
David Hess (davidwhess.delete@this.gmail.com) on January 11, 2021 4:26 am wrote:
> rwessel (rwessel.delete@this.yahoo.com) on January 7, 2021 6:47 pm wrote:
> >
> > A problem with your approach is the non-linear addressing of physical memory (although it's not that bad,
> > a shift and add in the addressing path). The overhead is a bigger issue. Doing nine physical memory words
> > for eight addressable words doesn't reduce the overhead, although a more complex scheme might. Crossing
> > DRAM boundaries array is going to be a hit as well, a non-trivial number of 64-byte cache line reads are
> > going to now cross some sort of boundary that will end up needing two separate read operations.
> >
> > A bigger problem is that you now can't even do the ECC check before reading all nine words.
> > So any optimized cache line loads (critical word first, etc.), become impossible.
> >
> > Of course those are all performance issues, and may be a suitable tradeoff in some cases.
>
> Some processors already do that to support better than SEC/DED with a 64/72 bit word. I no longer
> remember the exact details but AMD's Phenom processors can use multiple adjacent 64/72 bit words
> from a single memory channel to support 128/144 (and/or 256/288?) bit ECC correction.
As I mentioned, the number of bits needed to do SECDED grows logarithmically with the size of the word protected. If you have 16 ECC bits on a 128 bit word, you have considerably more bits than you need for just SECDED (seven extra, in fact). In that case there's not enough to actually do double error correction (maybe if you were willing to give up triple error detection). But at 256+32, you can have triple error correction, and quintuple error detection.
Alternatively, you can abandon general bit error correction, and take a more RAID-like approach, but limit corrections to the contents of a single (and entire) RAID block (basically what chipkill does).
> rwessel (rwessel.delete@this.yahoo.com) on January 7, 2021 6:47 pm wrote:
> >
> > A problem with your approach is the non-linear addressing of physical memory (although it's not that bad,
> > a shift and add in the addressing path). The overhead is a bigger issue. Doing nine physical memory words
> > for eight addressable words doesn't reduce the overhead, although a more complex scheme might. Crossing
> > DRAM boundaries array is going to be a hit as well, a non-trivial number of 64-byte cache line reads are
> > going to now cross some sort of boundary that will end up needing two separate read operations.
> >
> > A bigger problem is that you now can't even do the ECC check before reading all nine words.
> > So any optimized cache line loads (critical word first, etc.), become impossible.
> >
> > Of course those are all performance issues, and may be a suitable tradeoff in some cases.
>
> Some processors already do that to support better than SEC/DED with a 64/72 bit word. I no longer
> remember the exact details but AMD's Phenom processors can use multiple adjacent 64/72 bit words
> from a single memory channel to support 128/144 (and/or 256/288?) bit ECC correction.
As I mentioned, the number of bits needed to do SECDED grows logarithmically with the size of the word protected. If you have 16 ECC bits on a 128 bit word, you have considerably more bits than you need for just SECDED (seven extra, in fact). In that case there's not enough to actually do double error correction (maybe if you were willing to give up triple error detection). But at 256+32, you can have triple error correction, and quintuple error detection.
Alternatively, you can abandon general bit error correction, and take a more RAID-like approach, but limit corrections to the contents of a single (and entire) RAID block (basically what chipkill does).