By: Adrian (a.delete@this.acm.org), January 3, 2021 12:50 pm
Room: Moderated Discussions
Ian Cutress (ian.delete@this.anandtech.com) on January 3, 2021 11:09 am wrote:
>
> I want to add something in here, just from what we've heard from our users.
>
> To the extent that even though you can have a consumer CPU and ECC memory installed, and the motherboard
> reports that ECC is enabled, actually ECC might not be enabled. Even software that states that ECC is
> enabled is simply reading the motherboard register - the only way to confirm is to actually do a test
> that forces an ECC correction and to monitor them. This means that a chunk of people who actually think
> they have ECC working on a system do not. Finding the right combination of motherboard, motherboard BIOS/firmware,
> and memory to work is somewhat confusing because people are reporting that 'ECC is enabled', when it's
> simply only being reported as such by the motherboard and not actually tested.
>
> This comes mostly down to the fact that it's 'unofficial' for AMD. It's not part of the POR, it's not qualified
> at every stage of the CPU/motherboard design. Vendors don't have to check if it's actually working for consumer-grade
> CPUs, so whatever gets reported doesn't matter, because it's not part of the validation checks.
>
> This is why official support matters. At the moment AMD systems unofficially supporting
> ECC is a quagmire of 'system reporting as ECC enabled' vs ECC actually being enabled, tested
> for, and working. It's a step in the right direction sure, but end-users wanting this feature
> might not be protected at all, and spending extra for explicit support.
>
After seeing during the years countless BIOS bugs on various motherboards, I have no doubt that some MBs for AMD may have buggy ECC support.
Nevertheless, there are also motherboards like ASUS Pro WS X570-ACE (actually cheap compared to many X570 MBs), which was reviewed on your site in 2019, where the manufacturer stresses on the ECC support as a selling feature for the board.
On such a motherboard I would not expect any ECC bug, and indeed my sample works fine. I have also used a few ASRock boards, where I have also not encountered any problems.
There are also motherboards, like many (or maybe all) from Gigabyte and MSI, where the small print says that ECC memory modules are supported, but they are used in non-ECC mode.
I wonder if the ECC problems reported to you were indeed from buggy BIOS'es or from BIOS'es that worked as intended, but which were confusingly documented.
Because most motherboards for AMD have BIOS settings for memory overclocking, it is easy to verify that ECC really works. It is true that this is easy only for an experienced computer user.
On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would allow access to the testing features of the memory controller on older AMD CPUs.
I have not attempted to use this with any Ryzen, so I do not know if this still works. In the case when it was kept up-to-date, testing on Linux would be even easier, without having to fiddle with the BIOS settings.
>
> I want to add something in here, just from what we've heard from our users.
>
> To the extent that even though you can have a consumer CPU and ECC memory installed, and the motherboard
> reports that ECC is enabled, actually ECC might not be enabled. Even software that states that ECC is
> enabled is simply reading the motherboard register - the only way to confirm is to actually do a test
> that forces an ECC correction and to monitor them. This means that a chunk of people who actually think
> they have ECC working on a system do not. Finding the right combination of motherboard, motherboard BIOS/firmware,
> and memory to work is somewhat confusing because people are reporting that 'ECC is enabled', when it's
> simply only being reported as such by the motherboard and not actually tested.
>
> This comes mostly down to the fact that it's 'unofficial' for AMD. It's not part of the POR, it's not qualified
> at every stage of the CPU/motherboard design. Vendors don't have to check if it's actually working for consumer-grade
> CPUs, so whatever gets reported doesn't matter, because it's not part of the validation checks.
>
> This is why official support matters. At the moment AMD systems unofficially supporting
> ECC is a quagmire of 'system reporting as ECC enabled' vs ECC actually being enabled, tested
> for, and working. It's a step in the right direction sure, but end-users wanting this feature
> might not be protected at all, and spending extra for explicit support.
>
After seeing during the years countless BIOS bugs on various motherboards, I have no doubt that some MBs for AMD may have buggy ECC support.
Nevertheless, there are also motherboards like ASUS Pro WS X570-ACE (actually cheap compared to many X570 MBs), which was reviewed on your site in 2019, where the manufacturer stresses on the ECC support as a selling feature for the board.
On such a motherboard I would not expect any ECC bug, and indeed my sample works fine. I have also used a few ASRock boards, where I have also not encountered any problems.
There are also motherboards, like many (or maybe all) from Gigabyte and MSI, where the small print says that ECC memory modules are supported, but they are used in non-ECC mode.
I wonder if the ECC problems reported to you were indeed from buggy BIOS'es or from BIOS'es that worked as intended, but which were confusingly documented.
Because most motherboards for AMD have BIOS settings for memory overclocking, it is easy to verify that ECC really works. It is true that this is easy only for an experienced computer user.
On Linux, there was the kernel option CONFIG_EDAC_AMD64_ERROR_INJECTION, which would allow access to the testing features of the memory controller on older AMD CPUs.
I have not attempted to use this with any Ryzen, so I do not know if this still works. In the case when it was kept up-to-date, testing on Linux would be even easier, without having to fiddle with the BIOS settings.