By: Dan Strother (dan.strother.delete@this.gmail.com), January 3, 2021 7:00 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on January 3, 2021 1:34 pm wrote:
> > > On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would
> > > allow access to the testing features of the memory controller on older AMD CPUs.
> > >
> The code in /usr/src/linux/drivers/edac/amd64_edac_inj.c does not have any mention about compatibility
> with a specific CPU family, so hopefully these functions continue to work on the current AMD CPUs.
>
> That option is off by default, so the kernel must be recompiled to activate it and I must search the Linux
> documentation to discover what must be written into /sys/devices/system/edac/mc/ to inject errors.
>
> When I will have some spare time, I will test if this works.
Note that even if the CPU supports ECC error injection, the BIOS may disable it. See this post for some context:
https://hardwarecanucks.com/forum/threads/ecc-memory-amds-ryzen-a-deep-dive-comment-thread.75041/page-6#post-902700
In that post, the poster is trying to use MemTest86 Pro (the paid PassMark version, not the free one) to inject errors on their Ryzen 3000 system with an ASRock Rack motherboard. They had to change the "Disable Memory Error Injection" option in the motherboard's BIOS to enable injection. Unfortunately, they weren't able to confirm that it was actually working - errors appeared to be injected, but then went unreported (even worse, ASRock Rack support then claimed that ECC reporting wasn't supported at all!).
I found this some months ago during my Xeon vs Ryzen ECC research; there may be better posts now. It's anecdotes like this one that steered me away from Ryzen at the time. I wish I had come across your positive reports then, Adrian - I might have wound up with a Ryzen system instead.
I also have a vague recollection that I found some comments suggesting that error injection was fused off on Ryzen 3000 CPUs (but had been supported on 2000 CPUs), but I'm unable to find this in my notes. The MemTest86 version history does have some interesting comments around injection support on AMD:
https://www.memtest86.com/whats-new.html
For example: "Added ECC detection/injection support for AMD Ryzen chipsets. Note that injection support is typically disabled by AMD, except for some CPUs which are engineering samples."
And: "Added warning message when failing to inject ECC errors for Ryzen chipsets (due to being disabled in production)"
Does AMD document any of this in their public datasheets? (I haven't tried looking yet..)
> > > On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would
> > > allow access to the testing features of the memory controller on older AMD CPUs.
> > >
> The code in /usr/src/linux/drivers/edac/amd64_edac_inj.c does not have any mention about compatibility
> with a specific CPU family, so hopefully these functions continue to work on the current AMD CPUs.
>
> That option is off by default, so the kernel must be recompiled to activate it and I must search the Linux
> documentation to discover what must be written into /sys/devices/system/edac/mc/ to inject errors.
>
> When I will have some spare time, I will test if this works.
Note that even if the CPU supports ECC error injection, the BIOS may disable it. See this post for some context:
https://hardwarecanucks.com/forum/threads/ecc-memory-amds-ryzen-a-deep-dive-comment-thread.75041/page-6#post-902700
In that post, the poster is trying to use MemTest86 Pro (the paid PassMark version, not the free one) to inject errors on their Ryzen 3000 system with an ASRock Rack motherboard. They had to change the "Disable Memory Error Injection" option in the motherboard's BIOS to enable injection. Unfortunately, they weren't able to confirm that it was actually working - errors appeared to be injected, but then went unreported (even worse, ASRock Rack support then claimed that ECC reporting wasn't supported at all!).
I found this some months ago during my Xeon vs Ryzen ECC research; there may be better posts now. It's anecdotes like this one that steered me away from Ryzen at the time. I wish I had come across your positive reports then, Adrian - I might have wound up with a Ryzen system instead.
I also have a vague recollection that I found some comments suggesting that error injection was fused off on Ryzen 3000 CPUs (but had been supported on 2000 CPUs), but I'm unable to find this in my notes. The MemTest86 version history does have some interesting comments around injection support on AMD:
https://www.memtest86.com/whats-new.html
For example: "Added ECC detection/injection support for AMD Ryzen chipsets. Note that injection support is typically disabled by AMD, except for some CPUs which are engineering samples."
And: "Added warning message when failing to inject ECC errors for Ryzen chipsets (due to being disabled in production)"
Does AMD document any of this in their public datasheets? (I haven't tried looking yet..)