By: Ganon (anon.delete@this.gmail.com), March 3, 2021 10:05 am
Room: Moderated Discussions
A recent pair of papers from facebook emphasized the importance of checksum protection even
within a single process:
Facebook’s Tectonic Filesystem:Efficiency from Exascale
https://www.usenix.org/system/files/fast21-pan.pdf
"
At Tectonic’s scale, with thousands of machines reading and writing a large amount of data every day, in-memory data corruption is a regular occurrence, a phenomenon observed in other large-scale systems [12,27]. We address this by enforcing checksum checks within and between process boundaries.
"
and
Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications:The RocksDB Experience
https://www.usenix.org/system/files/fast21-dong.pdf
"
11. CPU/memory corruption does happen, though very rarely, and sometimes cannot be handled by data replication. (§5)
12.Integrity protection must cover the entire system in order to prevent corrupted data (e.g., caused by bitflips in CPU/memory) from being exposed to clients or other replicas; detecting corruption only when the data is at rest or being sent over the wire is insufficient. (§5)
"
-----
Checksums & similar protect the data but what about the code (instructions)? The total data footprint of instructions is smaller so the bitflips are less likely in practice there. Does hw take special measures to protect instructions from corruption (more than it does for data)? What sw measures make sense to protect instructions (assuming we need to care about this as well)?
within a single process:
Facebook’s Tectonic Filesystem:Efficiency from Exascale
https://www.usenix.org/system/files/fast21-pan.pdf
"
At Tectonic’s scale, with thousands of machines reading and writing a large amount of data every day, in-memory data corruption is a regular occurrence, a phenomenon observed in other large-scale systems [12,27]. We address this by enforcing checksum checks within and between process boundaries.
"
and
Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications:The RocksDB Experience
https://www.usenix.org/system/files/fast21-dong.pdf
"
11. CPU/memory corruption does happen, though very rarely, and sometimes cannot be handled by data replication. (§5)
12.Integrity protection must cover the entire system in order to prevent corrupted data (e.g., caused by bitflips in CPU/memory) from being exposed to clients or other replicas; detecting corruption only when the data is at rest or being sent over the wire is insufficient. (§5)
"
-----
Checksums & similar protect the data but what about the code (instructions)? The total data footprint of instructions is smaller so the bitflips are less likely in practice there. Does hw take special measures to protect instructions from corruption (more than it does for data)? What sw measures make sense to protect instructions (assuming we need to care about this as well)?
Topic | Posted By | Date |
---|---|---|
CPU & Memory bit flips | Ganon | 2021/03/03 10:05 AM |
Also "Silent Data Corruption" | Adrian | 2021/03/03 11:42 AM |
Thanks for the reference | Ganon | 2021/03/03 12:47 PM |
Implications for linux page cache | anon | 2021/03/03 12:54 PM |
Implications for linux page cache | Linus Torvalds | 2021/03/03 02:54 PM |
memory errors | blaine | 2021/03/03 03:53 PM |
memory errors | anon2 | 2021/03/03 06:30 PM |
memory errors | dmcq | 2021/03/04 06:16 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:26 AM |
memory errors | dmcq | 2021/03/04 07:40 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:58 AM |
memory errors | dmcq | 2021/03/04 08:12 AM |
memory errors | Carson | 2021/03/05 03:31 AM |
memory errors | Etienne Lorrain | 2021/03/05 07:23 AM |
memory errors | rwessel | 2021/03/05 08:48 AM |
memory errors | dmcq | 2021/03/05 01:01 PM |
memory errors | rwessel | 2021/03/05 01:23 PM |
memory errors | dmcq | 2021/03/05 01:51 PM |
memory errors | Brendan | 2021/03/06 12:38 AM |
memory errors | Carson | 2021/03/06 02:35 AM |
memory errors | Carson | 2021/03/06 07:24 AM |
memory errors | David Hess | 2021/03/04 02:44 PM |
memory errors | rwessel | 2021/03/04 06:14 PM |
memory errors | Linus Torvalds | 2021/03/04 09:21 PM |
memory errors | anon2 | 2021/03/04 10:46 PM |
memory errors | Carson | 2021/03/05 03:43 AM |
memory errors | anon2 | 2021/03/05 08:55 AM |
memory errors | gallier2 | 2021/03/05 03:22 AM |
memory errors | dmcq | 2021/03/05 01:59 PM |
memory errors | David Hess | 2021/03/06 05:27 AM |
memory errors | Carson | 2021/03/06 07:44 AM |
memory errors | Gabriele Svelto | 2021/03/06 11:11 AM |
memory errors | David Hess | 2021/03/06 11:28 AM |
memory errors | Michael S | 2021/03/06 03:45 PM |
memory errors | Doug S | 2021/03/04 11:48 AM |
memory errors | Michael S | 2021/03/04 12:36 PM |
memory errors | Jörn Engel | 2021/03/04 04:32 PM |
memory errors | Linus Torvalds | 2021/03/04 09:47 PM |
memory errors | Etienne Lorrain | 2021/03/05 02:09 AM |
memory errors | Michael S | 2021/03/05 05:06 AM |
memory errors | Linus Torvalds | 2021/03/05 12:59 PM |
memory errors | rwessel | 2021/03/05 01:32 PM |
memory errors | rwessel | 2021/03/05 01:37 PM |
memory errors | zArchJon | 2021/03/06 09:39 PM |
memory errors | Gabriele Svelto | 2021/03/06 01:58 PM |
memory errors | Jörn Engel | 2021/03/05 11:12 AM |
Amiga recoverable RAM disk? | Carson | 2021/03/05 04:03 AM |
Thanks - TIL a cool Amiga feature (nt) (NT) | John | 2021/03/05 01:51 PM |
Another cool Amiga feature, datatypes | Charles | 2021/03/06 01:01 AM |
Another cool Amiga feature, datatypes | Jukka Larja | 2021/03/06 02:23 AM |
Another cool Amiga feature, datatypes | Anon | 2021/03/06 01:40 PM |
Another cool Amiga feature, filesystems | Marcus | 2021/03/07 01:28 AM |
CPU & Memory bit flips | zArchJon | 2021/03/04 07:39 AM |
CPU & Memory bit flips | dmcq | 2021/03/04 07:59 AM |
CPU & Memory bit flips | rwessel | 2021/03/04 01:27 PM |
speak of the devil | Robert Williams | 2021/03/05 08:53 AM |
speak of the devil | dmcq | 2021/03/05 12:26 PM |
speak of the devil | Robert Williams | 2021/03/05 04:15 PM |