By: anon2 (, March 3, 2021 5:30 pm
blaine ( on March 3, 2021 2:53 pm wrote:
> In designing the HP(E) Superdome, where possible, we tried to do end to end protection. There are some areas
> where that is not possible (like when you use Intel processors, unless you use voting).

What do you mean by this?

I think by end-to-end, Linus just means that (for highly critical data) then the error correction metadata should be generated where the data is generated and stored where the data is stored and checked where the data is consumed. Not that all or any particular component along the way must have a given failure rate or particular error handling strategy. Although your end to end strategy obviously has to take into account the reliability of the components to target an overall error profile.

As it pertains to the Linux pagecache -- it makes a lot of sense to add individual error improvement strategies in parts of hardware you know the characteristics of in order to achieve the desired error rates, it makes a lot of sense to have an application that is designed to run on a hardware stack with a particular error rate profile. It makes less sense for intermediate layers to "just add some more ECC for good measure, just in case".
