By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), March 3, 2021 2:54 pm
Room: Moderated Discussions
anon (anon.delete@this.gmail.com) on March 3, 2021 11:54 am wrote:
> Generate checksum on write & verify before copying to user space buffer on every read?
That's not doable anyway, since shared mappings are a thing. But it would be a pointless operation even outside of that issue.
Honestly, the solution is
(a) admitting that there is no such thing as "perfect"
(b) making hardware fundamentally more reliable (ie ECC)
(c) end-to-end strong checksumming of data, and replication of the stuff that really matters
I won't go into (a). There are too many people who look for "perfect" solutions, dismissing things that help, and I think those people are naive, bordering on insanity.
I've gone into (b) extensively here before. No, it's not going to fix everything, but it's going to help a lot of cases.
And (c) is very much not about things like "read()" and "write()". Those are not end-to-end operations, and they fundamentally cannot know whether the data they are copying is reliable or not, because they don't know what the rules for the data is.
Doing checksums at those points is entirely pointless: what you checksum may not be the actual data, because the corruption could have happened before (ie when doing a "write()" system call, maybe the data was already corrupt in user space - you're just generating a completely meaningless checksum).
Note that checksums of data written to disk is a different thing: you're basically adding a protocol checksum between memory and the disk contents, and trying to protect against everything that can go wrong in between the two. That's an entirely different thing from checksumming when copying from RAM to RAM (ie writing to the page cache).
So (c) needs higher-level checksums by the programs that actually deal with long-lived data that people care about, by the applications that actually have semantic understanding of that data. Obviously you'd like to have some error recovery (which might be in the form of ECC, but honestly, at a higher level you most likely want it to be at a much higher level of redundancy entirely).
But that (a) is important. Accept it. You will never have a perfect system. Not in security, and not in the "corruption cannot happen" sense. All you can do is do a lot of mitigation (and the primary mitigation should always be noticing corruption).
Linus
> Generate checksum on write & verify before copying to user space buffer on every read?
That's not doable anyway, since shared mappings are a thing. But it would be a pointless operation even outside of that issue.
Honestly, the solution is
(a) admitting that there is no such thing as "perfect"
(b) making hardware fundamentally more reliable (ie ECC)
(c) end-to-end strong checksumming of data, and replication of the stuff that really matters
I won't go into (a). There are too many people who look for "perfect" solutions, dismissing things that help, and I think those people are naive, bordering on insanity.
I've gone into (b) extensively here before. No, it's not going to fix everything, but it's going to help a lot of cases.
And (c) is very much not about things like "read()" and "write()". Those are not end-to-end operations, and they fundamentally cannot know whether the data they are copying is reliable or not, because they don't know what the rules for the data is.
Doing checksums at those points is entirely pointless: what you checksum may not be the actual data, because the corruption could have happened before (ie when doing a "write()" system call, maybe the data was already corrupt in user space - you're just generating a completely meaningless checksum).
Note that checksums of data written to disk is a different thing: you're basically adding a protocol checksum between memory and the disk contents, and trying to protect against everything that can go wrong in between the two. That's an entirely different thing from checksumming when copying from RAM to RAM (ie writing to the page cache).
So (c) needs higher-level checksums by the programs that actually deal with long-lived data that people care about, by the applications that actually have semantic understanding of that data. Obviously you'd like to have some error recovery (which might be in the form of ECC, but honestly, at a higher level you most likely want it to be at a much higher level of redundancy entirely).
But that (a) is important. Accept it. You will never have a perfect system. Not in security, and not in the "corruption cannot happen" sense. All you can do is do a lot of mitigation (and the primary mitigation should always be noticing corruption).
Linus
Topic | Posted By | Date |
---|---|---|
CPU & Memory bit flips | Ganon | 2021/03/03 10:05 AM |
Also "Silent Data Corruption" | Adrian | 2021/03/03 11:42 AM |
Thanks for the reference | Ganon | 2021/03/03 12:47 PM |
Implications for linux page cache | anon | 2021/03/03 12:54 PM |
Implications for linux page cache | Linus Torvalds | 2021/03/03 02:54 PM |
memory errors | blaine | 2021/03/03 03:53 PM |
memory errors | anon2 | 2021/03/03 06:30 PM |
memory errors | dmcq | 2021/03/04 06:16 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:26 AM |
memory errors | dmcq | 2021/03/04 07:40 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:58 AM |
memory errors | dmcq | 2021/03/04 08:12 AM |
memory errors | Carson | 2021/03/05 03:31 AM |
memory errors | Etienne Lorrain | 2021/03/05 07:23 AM |
memory errors | rwessel | 2021/03/05 08:48 AM |
memory errors | dmcq | 2021/03/05 01:01 PM |
memory errors | rwessel | 2021/03/05 01:23 PM |
memory errors | dmcq | 2021/03/05 01:51 PM |
memory errors | Brendan | 2021/03/06 12:38 AM |
memory errors | Carson | 2021/03/06 02:35 AM |
memory errors | Carson | 2021/03/06 07:24 AM |
memory errors | David Hess | 2021/03/04 02:44 PM |
memory errors | rwessel | 2021/03/04 06:14 PM |
memory errors | Linus Torvalds | 2021/03/04 09:21 PM |
memory errors | anon2 | 2021/03/04 10:46 PM |
memory errors | Carson | 2021/03/05 03:43 AM |
memory errors | anon2 | 2021/03/05 08:55 AM |
memory errors | gallier2 | 2021/03/05 03:22 AM |
memory errors | dmcq | 2021/03/05 01:59 PM |
memory errors | David Hess | 2021/03/06 05:27 AM |
memory errors | Carson | 2021/03/06 07:44 AM |
memory errors | Gabriele Svelto | 2021/03/06 11:11 AM |
memory errors | David Hess | 2021/03/06 11:28 AM |
memory errors | Michael S | 2021/03/06 03:45 PM |
memory errors | Doug S | 2021/03/04 11:48 AM |
memory errors | Michael S | 2021/03/04 12:36 PM |
memory errors | Jörn Engel | 2021/03/04 04:32 PM |
memory errors | Linus Torvalds | 2021/03/04 09:47 PM |
memory errors | Etienne Lorrain | 2021/03/05 02:09 AM |
memory errors | Michael S | 2021/03/05 05:06 AM |
memory errors | Linus Torvalds | 2021/03/05 12:59 PM |
memory errors | rwessel | 2021/03/05 01:32 PM |
memory errors | rwessel | 2021/03/05 01:37 PM |
memory errors | zArchJon | 2021/03/06 09:39 PM |
memory errors | Gabriele Svelto | 2021/03/06 01:58 PM |
memory errors | Jörn Engel | 2021/03/05 11:12 AM |
Amiga recoverable RAM disk? | Carson | 2021/03/05 04:03 AM |
Thanks - TIL a cool Amiga feature (nt) (NT) | John | 2021/03/05 01:51 PM |
Another cool Amiga feature, datatypes | Charles | 2021/03/06 01:01 AM |
Another cool Amiga feature, datatypes | Jukka Larja | 2021/03/06 02:23 AM |
Another cool Amiga feature, datatypes | Anon | 2021/03/06 01:40 PM |
Another cool Amiga feature, filesystems | Marcus | 2021/03/07 01:28 AM |
CPU & Memory bit flips | zArchJon | 2021/03/04 07:39 AM |
CPU & Memory bit flips | dmcq | 2021/03/04 07:59 AM |
CPU & Memory bit flips | rwessel | 2021/03/04 01:27 PM |
speak of the devil | Robert Williams | 2021/03/05 08:53 AM |
speak of the devil | dmcq | 2021/03/05 12:26 PM |
speak of the devil | Robert Williams | 2021/03/05 04:15 PM |