By: Konrad Schwarz (no.spam.delete@this.no.spam), January 7, 2021 5:27 am
Room: Moderated Discussions
Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on January 7, 2021 3:22 am wrote:
> > Isn't ECC overhead 1/8th the cost of the data? Like if you
> > have 8 DRAM chips, the ECC chip would be the ninth one?
> >
>
> For parity you add one bit per ECC line, whatever the line size. But
> then it doesn't correct, and 2 bits flipped are not detected.
> For SECDED (Single Error Correct, Double Error Detect), you have a logical circuit which gives you the bit
> number which flipped, so for 32 bits ECC line you need enough bits to tell which bit has flipped, plus one
> value which tells "out-of-range, more than single error", so you need to count from 0 to 32 - so 6 bits.
> In reality you need to count from 0 to 32+6=38, still 6 bits, to also protect the ECC information.
> > My comment was about how the logical circuit would treat more than two bit flipped, mathematical
> > proof that 2 bits flipped is never treated a single error detected and corrected.
To add more detail on the Hamming code, the basis for traditional SECDED:
a number of "parity" (or check-) bits are added to pin-point the location of
the erroneous, correctable bit.
These parity bits use base-2 encoding to indicate the erroneous bit. For this scheme
to work, the n'th parity bit is defined as the XOR over those bits that have a "1"
in the binary representation of their position: when checking, if the n'th parity
is wrong, it indicates that one of the bits that have the n'th bit set in their
numbering has flipped; since there is a parity bit for each n, the flipped bit
can be identified.
The actual algorithm interleaves
the check positions into the data bits, so each position 2^n is actually occupied
by a check bit; bits are numbered starting with one. To ensure double error
detection, an additional bit numbered 0 is added which is the parity (or XOR)
over all bits.
> > Isn't ECC overhead 1/8th the cost of the data? Like if you
> > have 8 DRAM chips, the ECC chip would be the ninth one?
> >
>
> For parity you add one bit per ECC line, whatever the line size. But
> then it doesn't correct, and 2 bits flipped are not detected.
> For SECDED (Single Error Correct, Double Error Detect), you have a logical circuit which gives you the bit
> number which flipped, so for 32 bits ECC line you need enough bits to tell which bit has flipped, plus one
> value which tells "out-of-range, more than single error", so you need to count from 0 to 32 - so 6 bits.
> In reality you need to count from 0 to 32+6=38, still 6 bits, to also protect the ECC information.
> > My comment was about how the logical circuit would treat more than two bit flipped, mathematical
> > proof that 2 bits flipped is never treated a single error detected and corrected.
To add more detail on the Hamming code, the basis for traditional SECDED:
a number of "parity" (or check-) bits are added to pin-point the location of
the erroneous, correctable bit.
These parity bits use base-2 encoding to indicate the erroneous bit. For this scheme
to work, the n'th parity bit is defined as the XOR over those bits that have a "1"
in the binary representation of their position: when checking, if the n'th parity
is wrong, it indicates that one of the bits that have the n'th bit set in their
numbering has flipped; since there is a parity bit for each n, the flipped bit
can be identified.
The actual algorithm interleaves
the check positions into the data bits, so each position 2^n is actually occupied
by a check bit; bits are numbered starting with one. To ensure double error
detection, an additional bit numbered 0 is added which is the parity (or XOR)
over all bits.