By: Brett (ggtgp.delete@this.yahoo.com), January 4, 2021 5:08 pm
Room: Moderated Discussions
Jason Snyder (jmcsnyder.delete@this.hotmail.com) on January 4, 2021 1:44 pm wrote:
> I was kind of surprised when Linus Torvalds went for a Threadripper with only ECC UDIMM support
> (with UDIMMS being so hard to find) and the other Linus (Sebastian) didn't even use ECC RAM when
> building the system for Torvalds. It seemed like the hot setup would have been to go Epyc, use
> RDIMMS, and decouple RAM speed from Infinity Fabric speed to get top performance as you are not
> going to find ECC DIMMs that are fast enough to make sense to use the default 1:1 clock ratio.
>
> Also if you really want quiet without a heat sink so big on top of the CPU that it will detach from the motherboard
> while still attached to its chunk of motherboard (at least a problem I have had when shipping via UPS, though
> never a problem with FedEx for me), there is something to say about liquid cooling, at least with the right cooling
> solution selected. Just if you go with a custom loop, go for clear (die free) coolant unless aesthetics matter
> more than function as the clear stuff with the right metals in the loop (never mix copper and aluminum parts
> for example) won't gum up as fast. Then again I suppose with AMD Epyc you could select a processor with lots
> of thermal headroom and just scale back the fan speed. The other problem with going with Intel is they tend
> to have a very high temperature delta between the die and IHS under load, leaving little headroom for the cooler
> to do its job, so you end up really ramping up your cooling solution so the plate on it is as close to ambient
> as possible. However if you have the right AMD Epyc under the hood, the plate on the cooler can be relatively
> hot (compared to a proper Intel solution) and it is OK because the temperature delta between the die and the
> IHS is relatively small. This is as in going for a high core count (many cores to spread the heat generation
> across), low clock speed, low TDP (for the core count) CPU. AMD tends to use a solder for the heat transfer
> material, which is bad for the large dies Intel uses as the large dies tend to micro-fracture and fail prematurely
> due to differential heat expansion between the die and hard solder, but AMDs multiple smaller dies tend to fair
> better. The cheap TIM thermal paste Intel tends to use has poor heat transfer capability, so in addition to
> jamming more power into fewer cores, the heat transfer off of those cores is poor.
>
> This leads to another point, I have had Intel CPUs with solder go bad because I had my windows open for
> cross ventilation when the outside air turned out to be very badly polluted and gummed up the cooler
> with black, sticky stuff in almost no time flat, causing the CPU to get to 100C (and landing me in the
> hospital at about the same time because I couldn't breath). After the 100C event, I took measures to
> keep the cooling solution from failing like that again (and to keep my lungs from failing due to bad
> air quality outside), but the CPU never worked reliably again and I had to replace it. (Unfortunately
> I can't get replacement lungs.)
Take some anti-oxidants to clear out inflammation like Liposomal Vitamin C which bypasses the gut limit on vitamin C absorption. Stop eating sugar and high omega 6 vegetable oil.
https://www.amazon.com/gp/product/B010OVU0YK/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1
> So ECC RAM is not the whole picture. How the computer is cooled is
> also a big deal and a CPU running too hot and developing micro-fractures will also ruin your day.
>
>
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 2, 2021 12:21 pm wrote:
> > Jukka Larja (roskakori2006.delete@this.gmail.com) on January 1, 2021 10:28 pm wrote:
> > >
> > > So yeah, I do very much agree AMD has superior offering. ECC doesn't really matter here though.
> >
> > ECC absolutely matters.
> >
> > ECC availability matters a lot - exactly because Intel has been instrumental in
> > killing the whole ECC industry with it's horribly bad market segmentation.
> >
> > Go out and search for ECC DIMMs - it's really hard to find. Yes - probably entirely thanks
> > to AMD - it may have been gotten slightly better lately, but that's exactly my point.
> >
> > Intel has been detrimental to the whole industry and to users because
> > of their bad and misguided policies wrt ECC. Seriously.
> >
> > And if you don't believe me, then just look at multiple generations of rowhammer, where each
> > time Intel and memory manufacturers bleated about how it's going to be fixed next time.
> >
> > Narrator: "No it wasn't".
> >
> > And yes, that was - again - entirely about the misguided and arse-backwards policy
> > of "consumers don't need ECC", which made the market for ECC memory go away.
> >
> > The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are
> > starting do do ECC internally because they finally owned up to the fact that they absolutely have to.
> >
> > And the memory manufacturers claim it's because of economics and lower power. And they are
> > lying bastards - let me once again point to row-hammer about how those problems have existed
> > for several generations already, but these f*ckers happily sold broken hardware to consumers
> > and claimed it was an "attack", when it always was "we're cutting corners".
> >
> > How many times has a row-hammer like bit-flip happened just by pure bad luck on real
> > non-attack loads? We will never know. Because Intel was pushing shit to consumers.
> >
> > And I absolutely guarantee they happened. The "modern DRAM is so reliable that it doesn't need ECC"
> > was always a bedtime story for children that had been dropped on their heads a bit too many times.
> >
> > We have decades of odd random kernel oopses that could never be explained and were likely due to
> > bad memory. And if it causes a kernel oops, I can guarantee that there are several orders of magnitude
> > more cases where it just caused a bit-flip that just never ended up being so critical.
> >
> > Yes, I'm pissed off about it. You can find me complaining about this literally for decades
> > now. I don't want to say "I was right". I want this fixed, and I want ECC.
> >
> > And AMD did it. Intel didn't.
> >
> > > I don't really see AMD's unofficial ECC support being a big deal.
> >
> > I disagree. The difference between "the market for working memory actually exists" and "screw
> > consumers over by selling them subtly unreliable hardware" is an absolutely enormous one.
> >
> > And the fact that it's "unofficial" for AMD doesn't matter. It works. And it allows
> > the markets to - admittedly probably very slowly - start fixing themselves.
> >
> > But I blame Intel, because they were the big fish in the pond, and they were the
> > ones that caused the ECC market to basically implode over a couple of decades.
> >
> > ECC DRAM (or just parity) used to be standard and easily accessible back when. ECC
> > and parity isn't a new thing. It was literally killed by bad Intel policies.
> >
> > And don't let people tell you that DRAM got so reliable that it
> > wasn't needed. That was never ever really true. See above.
> >
> > Linus
>
>
> I was kind of surprised when Linus Torvalds went for a Threadripper with only ECC UDIMM support
> (with UDIMMS being so hard to find) and the other Linus (Sebastian) didn't even use ECC RAM when
> building the system for Torvalds. It seemed like the hot setup would have been to go Epyc, use
> RDIMMS, and decouple RAM speed from Infinity Fabric speed to get top performance as you are not
> going to find ECC DIMMs that are fast enough to make sense to use the default 1:1 clock ratio.
>
> Also if you really want quiet without a heat sink so big on top of the CPU that it will detach from the motherboard
> while still attached to its chunk of motherboard (at least a problem I have had when shipping via UPS, though
> never a problem with FedEx for me), there is something to say about liquid cooling, at least with the right cooling
> solution selected. Just if you go with a custom loop, go for clear (die free) coolant unless aesthetics matter
> more than function as the clear stuff with the right metals in the loop (never mix copper and aluminum parts
> for example) won't gum up as fast. Then again I suppose with AMD Epyc you could select a processor with lots
> of thermal headroom and just scale back the fan speed. The other problem with going with Intel is they tend
> to have a very high temperature delta between the die and IHS under load, leaving little headroom for the cooler
> to do its job, so you end up really ramping up your cooling solution so the plate on it is as close to ambient
> as possible. However if you have the right AMD Epyc under the hood, the plate on the cooler can be relatively
> hot (compared to a proper Intel solution) and it is OK because the temperature delta between the die and the
> IHS is relatively small. This is as in going for a high core count (many cores to spread the heat generation
> across), low clock speed, low TDP (for the core count) CPU. AMD tends to use a solder for the heat transfer
> material, which is bad for the large dies Intel uses as the large dies tend to micro-fracture and fail prematurely
> due to differential heat expansion between the die and hard solder, but AMDs multiple smaller dies tend to fair
> better. The cheap TIM thermal paste Intel tends to use has poor heat transfer capability, so in addition to
> jamming more power into fewer cores, the heat transfer off of those cores is poor.
>
> This leads to another point, I have had Intel CPUs with solder go bad because I had my windows open for
> cross ventilation when the outside air turned out to be very badly polluted and gummed up the cooler
> with black, sticky stuff in almost no time flat, causing the CPU to get to 100C (and landing me in the
> hospital at about the same time because I couldn't breath). After the 100C event, I took measures to
> keep the cooling solution from failing like that again (and to keep my lungs from failing due to bad
> air quality outside), but the CPU never worked reliably again and I had to replace it. (Unfortunately
> I can't get replacement lungs.)
Take some anti-oxidants to clear out inflammation like Liposomal Vitamin C which bypasses the gut limit on vitamin C absorption. Stop eating sugar and high omega 6 vegetable oil.
https://www.amazon.com/gp/product/B010OVU0YK/ref=ppx_yo_dt_b_asin_title_o02_s00?ie=UTF8&psc=1
> So ECC RAM is not the whole picture. How the computer is cooled is
> also a big deal and a CPU running too hot and developing micro-fractures will also ruin your day.
>
>
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 2, 2021 12:21 pm wrote:
> > Jukka Larja (roskakori2006.delete@this.gmail.com) on January 1, 2021 10:28 pm wrote:
> > >
> > > So yeah, I do very much agree AMD has superior offering. ECC doesn't really matter here though.
> >
> > ECC absolutely matters.
> >
> > ECC availability matters a lot - exactly because Intel has been instrumental in
> > killing the whole ECC industry with it's horribly bad market segmentation.
> >
> > Go out and search for ECC DIMMs - it's really hard to find. Yes - probably entirely thanks
> > to AMD - it may have been gotten slightly better lately, but that's exactly my point.
> >
> > Intel has been detrimental to the whole industry and to users because
> > of their bad and misguided policies wrt ECC. Seriously.
> >
> > And if you don't believe me, then just look at multiple generations of rowhammer, where each
> > time Intel and memory manufacturers bleated about how it's going to be fixed next time.
> >
> > Narrator: "No it wasn't".
> >
> > And yes, that was - again - entirely about the misguided and arse-backwards policy
> > of "consumers don't need ECC", which made the market for ECC memory go away.
> >
> > The arguments against ECC were always complete and utter garbage. Now even the memory manufacturers are
> > starting do do ECC internally because they finally owned up to the fact that they absolutely have to.
> >
> > And the memory manufacturers claim it's because of economics and lower power. And they are
> > lying bastards - let me once again point to row-hammer about how those problems have existed
> > for several generations already, but these f*ckers happily sold broken hardware to consumers
> > and claimed it was an "attack", when it always was "we're cutting corners".
> >
> > How many times has a row-hammer like bit-flip happened just by pure bad luck on real
> > non-attack loads? We will never know. Because Intel was pushing shit to consumers.
> >
> > And I absolutely guarantee they happened. The "modern DRAM is so reliable that it doesn't need ECC"
> > was always a bedtime story for children that had been dropped on their heads a bit too many times.
> >
> > We have decades of odd random kernel oopses that could never be explained and were likely due to
> > bad memory. And if it causes a kernel oops, I can guarantee that there are several orders of magnitude
> > more cases where it just caused a bit-flip that just never ended up being so critical.
> >
> > Yes, I'm pissed off about it. You can find me complaining about this literally for decades
> > now. I don't want to say "I was right". I want this fixed, and I want ECC.
> >
> > And AMD did it. Intel didn't.
> >
> > > I don't really see AMD's unofficial ECC support being a big deal.
> >
> > I disagree. The difference between "the market for working memory actually exists" and "screw
> > consumers over by selling them subtly unreliable hardware" is an absolutely enormous one.
> >
> > And the fact that it's "unofficial" for AMD doesn't matter. It works. And it allows
> > the markets to - admittedly probably very slowly - start fixing themselves.
> >
> > But I blame Intel, because they were the big fish in the pond, and they were the
> > ones that caused the ECC market to basically implode over a couple of decades.
> >
> > ECC DRAM (or just parity) used to be standard and easily accessible back when. ECC
> > and parity isn't a new thing. It was literally killed by bad Intel policies.
> >
> > And don't let people tell you that DRAM got so reliable that it
> > wasn't needed. That was never ever really true. See above.
> >
> > Linus
>
>