By: Adrian (a.delete@this.acm.org), January 2, 2021 1:17 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 2, 2021 12:21 pm wrote:
> Jukka Larja (roskakori2006.delete@this.gmail.com) on January 1, 2021 10:28 pm wrote:
> >
> > So yeah, I do very much agree AMD has superior offering. ECC doesn't really matter here though.
>
> ECC absolutely matters.
>
> ECC availability matters a lot - exactly because Intel has been instrumental in
> killing the whole ECC industry with it's horribly bad market segmentation.
>
> Go out and search for ECC DIMMs - it's really hard to find. Yes - probably entirely thanks
> to AMD - it may have been gotten slightly better lately, but that's exactly my point.
>
> Intel has been detrimental to the whole industry and to users because
> of their bad and misguided policies wrt ECC. Seriously.
>
> And if you don't believe me, then just look at multiple generations of rowhammer, where each
> time Intel and memory manufacturers bleated about how it's going to be fixed next time.
>
> Narrator: "No it wasn't".
>
> And yes, that was - again - entirely about the misguided and arse-backwards policy
> of "consumers don't need ECC", which made the market for ECC memory go away.
>
> The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are
> starting do do ECC internally because they finally owned up to the fact that they absolutely have to.
>
> And the memory manufacturers claim it's because of economics and lower power. And they are
> lying bastards - let me once again point to row-hammer about how those problems have existed
> for several generations already, but these f*ckers happily sold broken hardware to consumers
> and claimed it was an "attack", when it always was "we're cutting corners".
>
> How many times has a row-hammer like bit-flip happened just by pure bad luck on real
> non-attack loads? We will never know. Because Intel was pushing shit to consumers.
>
> And I absolutely guarantee they happened. The "modern DRAM is so reliable that it doesn't need ECC"
> was always a bedtime story for children that had been dropped on their heads a bit too many times.
>
> We have decades of odd random kernel oopses that could never be explained and were likely due to
> bad memory. And if it causes a kernel oops, I can guarantee that there are several orders of magnitude
> more cases where it just caused a bit-flip that just never ended up being so critical.
>
> Yes, I'm pissed off about it. You can find me complaining about this literally for decades
> now. I don't want to say "I was right". I want this fixed, and I want ECC.
>
> And AMD did it. Intel didn't.
>
> > I don't really see AMD's unofficial ECC support being a big deal.
>
> I disagree. The difference between "the market for working memory actually exists" and "screw
> consumers over by selling them subtly unreliable hardware" is an absolutely enormous one.
>
> And the fact that it's "unofficial" for AMD doesn't matter. It works. And it allows
> the markets to - admittedly probably very slowly - start fixing themselves.
>
> But I blame Intel, because they were the big fish in the pond, and they were the
> ones that caused the ECC market to basically implode over a couple of decades.
>
> ECC DRAM (or just parity) used to be standard and easily accessible back when. ECC
> and parity isn't a new thing. It was literally killed by bad Intel policies.
>
> And don't let people tell you that DRAM got so reliable that it
> wasn't needed. That was never ever really true. See above.
>
> Linus
Very well explained !
I want to add that the fact that the Ryzen ECC support is "unofficial", actually means that AMD does not list ECC in the Ryzen/Threadripper specifications, but those MB manufacturers who support ECC, list ECC in the MB specifications, so from the point of view of the buyer it is official enough, because if ECC does not work, the buyer can contact the MB vendor to ask support.
In my experience, and I have always used only computers with ECC memory for any size larger than an Intel NUC (and now you can also have computers with ECC and Ryzen V2000, Renoir, in the NUC size), I did not have yet any problems with ECC and Ryzen, but I had problems with buggy BIOS'es on some MBs with Intel Xeon E/W 1xxx, so hoping that the "official" ECC support for Intel Xeons guarantees the lack of any problems in comparison with the "unofficial" Ryzen ECC support, is an illusion.
> Jukka Larja (roskakori2006.delete@this.gmail.com) on January 1, 2021 10:28 pm wrote:
> >
> > So yeah, I do very much agree AMD has superior offering. ECC doesn't really matter here though.
>
> ECC absolutely matters.
>
> ECC availability matters a lot - exactly because Intel has been instrumental in
> killing the whole ECC industry with it's horribly bad market segmentation.
>
> Go out and search for ECC DIMMs - it's really hard to find. Yes - probably entirely thanks
> to AMD - it may have been gotten slightly better lately, but that's exactly my point.
>
> Intel has been detrimental to the whole industry and to users because
> of their bad and misguided policies wrt ECC. Seriously.
>
> And if you don't believe me, then just look at multiple generations of rowhammer, where each
> time Intel and memory manufacturers bleated about how it's going to be fixed next time.
>
> Narrator: "No it wasn't".
>
> And yes, that was - again - entirely about the misguided and arse-backwards policy
> of "consumers don't need ECC", which made the market for ECC memory go away.
>
> The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are
> starting do do ECC internally because they finally owned up to the fact that they absolutely have to.
>
> And the memory manufacturers claim it's because of economics and lower power. And they are
> lying bastards - let me once again point to row-hammer about how those problems have existed
> for several generations already, but these f*ckers happily sold broken hardware to consumers
> and claimed it was an "attack", when it always was "we're cutting corners".
>
> How many times has a row-hammer like bit-flip happened just by pure bad luck on real
> non-attack loads? We will never know. Because Intel was pushing shit to consumers.
>
> And I absolutely guarantee they happened. The "modern DRAM is so reliable that it doesn't need ECC"
> was always a bedtime story for children that had been dropped on their heads a bit too many times.
>
> We have decades of odd random kernel oopses that could never be explained and were likely due to
> bad memory. And if it causes a kernel oops, I can guarantee that there are several orders of magnitude
> more cases where it just caused a bit-flip that just never ended up being so critical.
>
> Yes, I'm pissed off about it. You can find me complaining about this literally for decades
> now. I don't want to say "I was right". I want this fixed, and I want ECC.
>
> And AMD did it. Intel didn't.
>
> > I don't really see AMD's unofficial ECC support being a big deal.
>
> I disagree. The difference between "the market for working memory actually exists" and "screw
> consumers over by selling them subtly unreliable hardware" is an absolutely enormous one.
>
> And the fact that it's "unofficial" for AMD doesn't matter. It works. And it allows
> the markets to - admittedly probably very slowly - start fixing themselves.
>
> But I blame Intel, because they were the big fish in the pond, and they were the
> ones that caused the ECC market to basically implode over a couple of decades.
>
> ECC DRAM (or just parity) used to be standard and easily accessible back when. ECC
> and parity isn't a new thing. It was literally killed by bad Intel policies.
>
> And don't let people tell you that DRAM got so reliable that it
> wasn't needed. That was never ever really true. See above.
>
> Linus
Very well explained !
I want to add that the fact that the Ryzen ECC support is "unofficial", actually means that AMD does not list ECC in the Ryzen/Threadripper specifications, but those MB manufacturers who support ECC, list ECC in the MB specifications, so from the point of view of the buyer it is official enough, because if ECC does not work, the buyer can contact the MB vendor to ask support.
In my experience, and I have always used only computers with ECC memory for any size larger than an Intel NUC (and now you can also have computers with ECC and Ryzen V2000, Renoir, in the NUC size), I did not have yet any problems with ECC and Ryzen, but I had problems with buggy BIOS'es on some MBs with Intel Xeon E/W 1xxx, so hoping that the "official" ECC support for Intel Xeons guarantees the lack of any problems in comparison with the "unofficial" Ryzen ECC support, is an illusion.