By: Adrian (a.delete@this.acm.org), December 17, 2020 9:43 am
Room: Moderated Discussions
Konrad Schwarz (no.spam.delete@this.no.spam) on December 17, 2020 7:37 am wrote:
> In some of the SoCs I have dealt with in recent years (e.g. Xilinx Zynq),
> activation of the ECC feature in a system with e.g. a 32-bit data channel to
> memory causes the effective (software relevant) width to be reduced to 16 bits;
> 6(?) bits of the remaining data are used to store the ECC bits and the remaining
> 10 bits are wasted.
>
> Burst lengths are doubled so no changes are needed to cache line lengths.
> The end result is a slightly slower system with halved memory capacity;
> the only software impact is new code to deal with ECC errors. The memory
> chips themselves can have a x4 or x8 organization, reducing the board real estate
> requirements.
>
> As the same memory design can be used for non-ECC and ECC cases, a decision
> to eliminate ECC can be taken at a late stage of development.
This can make sense in a FPGA board, because such boards normally use a very expensive Xilinx FPGA, with a price from a few hundreds $ to a few thousands $, together with a few GB of DRAM, having a price less than $100, usually much less than $100, so wasting 1/3 of the memory wastes just very little of the resources that you are paying for, while maintaining the performance.
An Elkhart Lake or Tiger Lake board will frequently be used with a quantity of memory costing as much or more than the CPU, so losing almost 1/2 of the memory would be seen as unacceptable.
So Intel preferred to sacrifice the performance (but the performance loss should be greatly diminished by an ECC cache), but to lose only much less of the memory capacity.
> In some of the SoCs I have dealt with in recent years (e.g. Xilinx Zynq),
> activation of the ECC feature in a system with e.g. a 32-bit data channel to
> memory causes the effective (software relevant) width to be reduced to 16 bits;
> 6(?) bits of the remaining data are used to store the ECC bits and the remaining
> 10 bits are wasted.
>
> Burst lengths are doubled so no changes are needed to cache line lengths.
> The end result is a slightly slower system with halved memory capacity;
> the only software impact is new code to deal with ECC errors. The memory
> chips themselves can have a x4 or x8 organization, reducing the board real estate
> requirements.
>
> As the same memory design can be used for non-ECC and ECC cases, a decision
> to eliminate ECC can be taken at a late stage of development.
This can make sense in a FPGA board, because such boards normally use a very expensive Xilinx FPGA, with a price from a few hundreds $ to a few thousands $, together with a few GB of DRAM, having a price less than $100, usually much less than $100, so wasting 1/3 of the memory wastes just very little of the resources that you are paying for, while maintaining the performance.
An Elkhart Lake or Tiger Lake board will frequently be used with a quantity of memory costing as much or more than the CPU, so losing almost 1/2 of the memory would be seen as unacceptable.
So Intel preferred to sacrifice the performance (but the performance loss should be greatly diminished by an ECC cache), but to lose only much less of the memory capacity.