By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), January 6, 2021 11:38 am
Room: Moderated Discussions
⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on January 6, 2021 9:45 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 5, 2021 12:29 pm wrote:
> > ECC is safer under normal circumstances,
>
> What is "normal circumstances"? From a mathematical viewpoint, ECC DDR4 modules can afford to be of lower
> quality
Bullshit.
ECC safety isn't about the "correctable" part.
Why don't people get that? The correction part of ECC is almost irrelevant.
In fact, five lines later, you ask for the OS to do checksumming for DRAM problems, because you seem to realize that the only thing that really matters is reporting whether the memory you use is reliable or not.
That is why you need ECC. Not for correction. For knowing whether your machine is reliable or not. Without ECC, you're basically screwed. You have no idea.
(And yes, I've said it before, and I'll say it again: parity is almost as good as ECC. Exactly because parity does the important part - not as well, no, but certainly a lot better than nothing).
And no, it's not the job of the OS to fix broken hardware. Doing checksums of disk contents is one thing (but honestly, the disks themselves had better have those checksums internally anyway, and they do), but doing "software ECC" is just you desperately trying to make excuses and make up and argument that is complete and utter garbage.
And btw, don't talk to me about uncorrectable errors, or - worse yet - about undetectable three-bit flips, which is inevitably the next stage of denial. Do they happen? Sure. But the normal single-bit flips will happen before they do, and honestly, the whole argument of "but nothing is perfect" isn't an argument at all, it's just pure and utter stupidity.
So stop the idiocy already.
Linus
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 5, 2021 12:29 pm wrote:
> > ECC is safer under normal circumstances,
>
> What is "normal circumstances"? From a mathematical viewpoint, ECC DDR4 modules can afford to be of lower
> quality
Bullshit.
ECC safety isn't about the "correctable" part.
Why don't people get that? The correction part of ECC is almost irrelevant.
In fact, five lines later, you ask for the OS to do checksumming for DRAM problems, because you seem to realize that the only thing that really matters is reporting whether the memory you use is reliable or not.
That is why you need ECC. Not for correction. For knowing whether your machine is reliable or not. Without ECC, you're basically screwed. You have no idea.
(And yes, I've said it before, and I'll say it again: parity is almost as good as ECC. Exactly because parity does the important part - not as well, no, but certainly a lot better than nothing).
And no, it's not the job of the OS to fix broken hardware. Doing checksums of disk contents is one thing (but honestly, the disks themselves had better have those checksums internally anyway, and they do), but doing "software ECC" is just you desperately trying to make excuses and make up and argument that is complete and utter garbage.
And btw, don't talk to me about uncorrectable errors, or - worse yet - about undetectable three-bit flips, which is inevitably the next stage of denial. Do they happen? Sure. But the normal single-bit flips will happen before they do, and honestly, the whole argument of "but nothing is perfect" isn't an argument at all, it's just pure and utter stupidity.
So stop the idiocy already.
Linus