By: Adrian (a.delete@this.acm.org), March 3, 2021 11:42 am
Room: Moderated Discussions
Ganon (anon.delete@this.gmail.com) on March 3, 2021 9:05 am wrote:
> A recent pair of papers from facebook emphasized the importance of checksum protection even
> within a single process:
>
> Facebook’s Tectonic Filesystem:Efficiency from Exascale
> https://www.usenix.org/system/files/fast21-pan.pdf
>
> "
> At Tectonic’s scale, with thousands of machines reading and writing a large amount of data every day,
> in-memory data corruption is a regular occurrence, a phenomenon observed in other large-scale systems
> [12,27]. We address this by enforcing checksum checks within and between process boundaries.
> "
>
> and
>
> Evolution of Development Priorities in Key-value Stores
> Serving Large-scale Applications:The RocksDB Experience
> https://www.usenix.org/system/files/fast21-dong.pdf
>
> "
> 11. CPU/memory corruption does happen, though very rarely,
> and sometimes cannot be handled by data replication. (§5)
>
> 12.Integrity protection must cover the entire system in order to prevent corrupted data (e.g.,
> caused by bitflips in CPU/memory) from being exposed to clients or other replicas; detecting
> corruption only when the data is at rest or being sent over the wire is insufficient. (§5)
> "
>
>
> -----
> Checksums & similar protect the data but what about the code (instructions)? The total data footprint
> of instructions is smaller so the bitflips are less likely in practice there. Does hw take special
> measures to protect instructions from corruption (more than it does for data)? What sw measures
> make sense to protect instructions (assuming we need to care about this as well)?
Another interesting paper just published by Facebook:
https://arxiv.org/abs/2102.11245
"Silent Data Corruptions at Scale".
> A recent pair of papers from facebook emphasized the importance of checksum protection even
> within a single process:
>
> Facebook’s Tectonic Filesystem:Efficiency from Exascale
> https://www.usenix.org/system/files/fast21-pan.pdf
>
> "
> At Tectonic’s scale, with thousands of machines reading and writing a large amount of data every day,
> in-memory data corruption is a regular occurrence, a phenomenon observed in other large-scale systems
> [12,27]. We address this by enforcing checksum checks within and between process boundaries.
> "
>
> and
>
> Evolution of Development Priorities in Key-value Stores
> Serving Large-scale Applications:The RocksDB Experience
> https://www.usenix.org/system/files/fast21-dong.pdf
>
> "
> 11. CPU/memory corruption does happen, though very rarely,
> and sometimes cannot be handled by data replication. (§5)
>
> 12.Integrity protection must cover the entire system in order to prevent corrupted data (e.g.,
> caused by bitflips in CPU/memory) from being exposed to clients or other replicas; detecting
> corruption only when the data is at rest or being sent over the wire is insufficient. (§5)
> "
>
>
> -----
> Checksums & similar protect the data but what about the code (instructions)? The total data footprint
> of instructions is smaller so the bitflips are less likely in practice there. Does hw take special
> measures to protect instructions from corruption (more than it does for data)? What sw measures
> make sense to protect instructions (assuming we need to care about this as well)?
Another interesting paper just published by Facebook:
https://arxiv.org/abs/2102.11245
"Silent Data Corruptions at Scale".
Topic | Posted By | Date |
---|---|---|
CPU & Memory bit flips | Ganon | 2021/03/03 10:05 AM |
Also "Silent Data Corruption" | Adrian | 2021/03/03 11:42 AM |
Thanks for the reference | Ganon | 2021/03/03 12:47 PM |
Implications for linux page cache | anon | 2021/03/03 12:54 PM |
Implications for linux page cache | Linus Torvalds | 2021/03/03 02:54 PM |
memory errors | blaine | 2021/03/03 03:53 PM |
memory errors | anon2 | 2021/03/03 06:30 PM |
memory errors | dmcq | 2021/03/04 06:16 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:26 AM |
memory errors | dmcq | 2021/03/04 07:40 AM |
memory errors | Etienne Lorrain | 2021/03/04 07:58 AM |
memory errors | dmcq | 2021/03/04 08:12 AM |
memory errors | Carson | 2021/03/05 03:31 AM |
memory errors | Etienne Lorrain | 2021/03/05 07:23 AM |
memory errors | rwessel | 2021/03/05 08:48 AM |
memory errors | dmcq | 2021/03/05 01:01 PM |
memory errors | rwessel | 2021/03/05 01:23 PM |
memory errors | dmcq | 2021/03/05 01:51 PM |
memory errors | Brendan | 2021/03/06 12:38 AM |
memory errors | Carson | 2021/03/06 02:35 AM |
memory errors | Carson | 2021/03/06 07:24 AM |
memory errors | David Hess | 2021/03/04 02:44 PM |
memory errors | rwessel | 2021/03/04 06:14 PM |
memory errors | Linus Torvalds | 2021/03/04 09:21 PM |
memory errors | anon2 | 2021/03/04 10:46 PM |
memory errors | Carson | 2021/03/05 03:43 AM |
memory errors | anon2 | 2021/03/05 08:55 AM |
memory errors | gallier2 | 2021/03/05 03:22 AM |
memory errors | dmcq | 2021/03/05 01:59 PM |
memory errors | David Hess | 2021/03/06 05:27 AM |
memory errors | Carson | 2021/03/06 07:44 AM |
memory errors | Gabriele Svelto | 2021/03/06 11:11 AM |
memory errors | David Hess | 2021/03/06 11:28 AM |
memory errors | Michael S | 2021/03/06 03:45 PM |
memory errors | Doug S | 2021/03/04 11:48 AM |
memory errors | Michael S | 2021/03/04 12:36 PM |
memory errors | Jörn Engel | 2021/03/04 04:32 PM |
memory errors | Linus Torvalds | 2021/03/04 09:47 PM |
memory errors | Etienne Lorrain | 2021/03/05 02:09 AM |
memory errors | Michael S | 2021/03/05 05:06 AM |
memory errors | Linus Torvalds | 2021/03/05 12:59 PM |
memory errors | rwessel | 2021/03/05 01:32 PM |
memory errors | rwessel | 2021/03/05 01:37 PM |
memory errors | zArchJon | 2021/03/06 09:39 PM |
memory errors | Gabriele Svelto | 2021/03/06 01:58 PM |
memory errors | Jörn Engel | 2021/03/05 11:12 AM |
Amiga recoverable RAM disk? | Carson | 2021/03/05 04:03 AM |
Thanks - TIL a cool Amiga feature (nt) (NT) | John | 2021/03/05 01:51 PM |
Another cool Amiga feature, datatypes | Charles | 2021/03/06 01:01 AM |
Another cool Amiga feature, datatypes | Jukka Larja | 2021/03/06 02:23 AM |
Another cool Amiga feature, datatypes | Anon | 2021/03/06 01:40 PM |
Another cool Amiga feature, filesystems | Marcus | 2021/03/07 01:28 AM |
CPU & Memory bit flips | zArchJon | 2021/03/04 07:39 AM |
CPU & Memory bit flips | dmcq | 2021/03/04 07:59 AM |
CPU & Memory bit flips | rwessel | 2021/03/04 01:27 PM |
speak of the devil | Robert Williams | 2021/03/05 08:53 AM |
speak of the devil | dmcq | 2021/03/05 12:26 PM |
speak of the devil | Robert Williams | 2021/03/05 04:15 PM |