By: Adrian (a.delete@this.acm.org), January 3, 2021 1:34 pm
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on January 3, 2021 1:08 pm wrote:
> Adrian (a.delete@this.acm.org) on January 3, 2021 12:50 pm wrote:
> > After seeing during the years countless BIOS bugs on various motherboards,
> > I have no doubt that some MBs for AMD may have buggy ECC support.
>
> I've seen BIOS bugs in server boards too. The manufacturers I checked (ASUS, ASRock and Gigabyte) seem pretty
> transparent about their support for ECC in AM4 motherboards. The specs clearly state when they support it and
> when they don't, and which processors are required. If they really didn't care they wouldn't even mention it
> in specs, let alone mention that ECC sticks will work but w/o ECC functionality on some of their boards.
>
> > On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would
> > allow access to the testing features of the memory controller on older AMD CPUs.
> >
> > I have not attempted to use this with any Ryzen, so I do
> > not know if this still works. In the case when it was
> > kept up-to-date, testing on Linux would be even easier, without having to fiddle with the BIOS settings.
>
> I haven't tried it either but now I'm curious too. Either way on consumer motherboards you can also just
> overclock the memory. It will start throwing errors eventually, and they're going to be real ones.
The code in /usr/src/linux/drivers/edac/amd64_edac_inj.c does not have any mention about compatibility with a specific CPU family, so hopefully these functions continue to work on the current AMD CPUs.
That option is off by default, so the kernel must be recompiled to activate it and I must search the Linux documentation to discover what must be written into /sys/devices/system/edac/mc/ to inject errors.
When I will have some spare time, I will test if this works.
> Adrian (a.delete@this.acm.org) on January 3, 2021 12:50 pm wrote:
> > After seeing during the years countless BIOS bugs on various motherboards,
> > I have no doubt that some MBs for AMD may have buggy ECC support.
>
> I've seen BIOS bugs in server boards too. The manufacturers I checked (ASUS, ASRock and Gigabyte) seem pretty
> transparent about their support for ECC in AM4 motherboards. The specs clearly state when they support it and
> when they don't, and which processors are required. If they really didn't care they wouldn't even mention it
> in specs, let alone mention that ECC sticks will work but w/o ECC functionality on some of their boards.
>
> > On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would
> > allow access to the testing features of the memory controller on older AMD CPUs.
> >
> > I have not attempted to use this with any Ryzen, so I do
> > not know if this still works. In the case when it was
> > kept up-to-date, testing on Linux would be even easier, without having to fiddle with the BIOS settings.
>
> I haven't tried it either but now I'm curious too. Either way on consumer motherboards you can also just
> overclock the memory. It will start throwing errors eventually, and they're going to be real ones.
The code in /usr/src/linux/drivers/edac/amd64_edac_inj.c does not have any mention about compatibility with a specific CPU family, so hopefully these functions continue to work on the current AMD CPUs.
That option is off by default, so the kernel must be recompiled to activate it and I must search the Linux documentation to discover what must be written into /sys/devices/system/edac/mc/ to inject errors.
When I will have some spare time, I will test if this works.