By: Adrian (a.delete@this.acm.org), January 11, 2021 5:25 am
Room: Moderated Discussions
David Hess (davidwhess.delete@this.gmail.com) on January 11, 2021 4:26 am wrote:
> rwessel (rwessel.delete@this.yahoo.com) on January 7, 2021 6:47 pm wrote:
> >
> > A problem with your approach is the non-linear addressing of physical memory (although it's not that bad,
> > a shift and add in the addressing path). The overhead is a bigger issue. Doing nine physical memory words
> > for eight addressable words doesn't reduce the overhead, although a more complex scheme might. Crossing
> > DRAM boundaries array is going to be a hit as well, a non-trivial number of 64-byte cache line reads are
> > going to now cross some sort of boundary that will end up needing two separate read operations.
> >
> > A bigger problem is that you now can't even do the ECC check before reading all nine words.
> > So any optimized cache line loads (critical word first, etc.), become impossible.
> >
> > Of course those are all performance issues, and may be a suitable tradeoff in some cases.
>
> Some processors already do that to support better than SEC/DED with a 64/72 bit word. I no longer
> remember the exact details but AMD's Phenom processors can use multiple adjacent 64/72 bit words
> from a single memory channel to support 128/144 (and/or 256/288?) bit ECC correction.
>
AMD and Intel support several variants of Chip-Kill ECC, where they use indeed 128/144 ECC, instead of the 64/72 SECDED ECC, i.e. this works when you populate each 128-bit memory double channel with pairs of ECC DIMMs.
While SECDED corrects any single-bit error, for Chip-Kill the 128 bits are seen as either 32 4-bit symbols or 16 8-bit symbols, and the code used is able to correct any error (i.e. any combination of bit errors) that affects a single symbol. Many kinds of errors that affect multiple symbols are detected.
> rwessel (rwessel.delete@this.yahoo.com) on January 7, 2021 6:47 pm wrote:
> >
> > A problem with your approach is the non-linear addressing of physical memory (although it's not that bad,
> > a shift and add in the addressing path). The overhead is a bigger issue. Doing nine physical memory words
> > for eight addressable words doesn't reduce the overhead, although a more complex scheme might. Crossing
> > DRAM boundaries array is going to be a hit as well, a non-trivial number of 64-byte cache line reads are
> > going to now cross some sort of boundary that will end up needing two separate read operations.
> >
> > A bigger problem is that you now can't even do the ECC check before reading all nine words.
> > So any optimized cache line loads (critical word first, etc.), become impossible.
> >
> > Of course those are all performance issues, and may be a suitable tradeoff in some cases.
>
> Some processors already do that to support better than SEC/DED with a 64/72 bit word. I no longer
> remember the exact details but AMD's Phenom processors can use multiple adjacent 64/72 bit words
> from a single memory channel to support 128/144 (and/or 256/288?) bit ECC correction.
>
AMD and Intel support several variants of Chip-Kill ECC, where they use indeed 128/144 ECC, instead of the 64/72 SECDED ECC, i.e. this works when you populate each 128-bit memory double channel with pairs of ECC DIMMs.
While SECDED corrects any single-bit error, for Chip-Kill the 128 bits are seen as either 32 4-bit symbols or 16 8-bit symbols, and the code used is able to correct any error (i.e. any combination of bit errors) that affects a single symbol. Many kinds of errors that affect multiple symbols are detected.