Dealing with memory errors

By: dmcq (dmcq.delete@this.fano.co.uk), July 14, 2021 4:27 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on July 14, 2021 12:50 pm wrote:
> Hi,
>
> dmcq (dmcq.delete@this.fano.co.uk) on July 14, 2021 10:13 am wrote:
> > Actually I did write an operating system myself from scratch complete with multiple virtual
> > machines about thirty years ago. The most difficult error handling was dealing with memory
> > errors as we expected a couple a week to occur in the various boards using it.
>
> I apologize for completely changing the topic; but I've been researching/thinking about/"theorizing
> about" memory error tolerance (on and off) for about 10 years now - mostly revolving around
> the obvious "software ECC on top of paging" approach (as described in papers like this one:
> https://repository.lib.ncsu.edu/bitstream/handle/1840.20/33537/TR-2016-7.pdf ).
>
> Are there any interesting insights and/or details you'd be willing to share regarding your experiences?
>
> - Brendan

Nothing particularly wonderful. The memory had parity but not hamming - the problem would have been much less worrying with hamming but that's what saving money does to you. Tasks for the board could be discarded and restarted easily if it said there was an error which made things easier. On the other hand if it didn't recover in a halfway decent manner from a failure the host would be unable to restart the board after timing it out.

As to the handling, chunks of read only areas were covered by longitutinal sums and were checked incrementally in the watchdog timer and fixed if there there was only one error per chunk. Errors could also be detected whilst running or in IO. Read only errors would be fixed. If an error occurred in an unused area including parts of buffers which weren't full the error was just fixed. Otherwise errors in writable areas which could be ascribed to a particular VM caused an error to be returned for the task, otherwise the board said there was a general error and everything was restarted. The handler for the errors had a little bit duplicated and provided one got past an initial little part an error in either half would be handled by going down the other path. Stuck errors were handled using virtual memory. Details would be returned about where errors happened rather like a disk.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/09 09:56 PM
  Is unsafe hell truly good for linux kernel in the future?Brendan2021/07/10 12:59 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 01:37 PM
  Is unsafe hell truly good for linux kernel in the future?anon2021/07/10 04:14 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 01:40 PM
      Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 03:59 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 04:42 PM
      Is unsafe hell truly good for linux kernel in the future?anon2021/07/11 06:11 AM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/12 12:40 PM
  Is unsafe hell truly good for linux kernel in the future?Foo_2021/07/10 06:56 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 09:59 AM
      Most RWT posters don’t decide what goes into the Linux kernelMark Roulo2021/07/10 12:55 PM
      Is unsafe hell truly good for linux kernel in the future?Foo_2021/07/22 11:10 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 10:22 AM
      Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 10:24 AM
        Déja VuDismissive2021/07/10 10:41 AM
          Déja Vucqwrteur2021/07/10 10:47 AM
            Déja VuDismissive2021/07/10 10:51 AM
            Déja VuMichael S2021/07/10 01:11 PM
  Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 12:51 PM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 01:32 PM
      Is unsafe hell truly good for linux kernel in the future?Michael S2021/07/10 02:04 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 02:25 PM
      Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 03:56 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 04:41 PM
          Is unsafe hell truly good for linux kernel in the future?Rayla2021/07/10 05:33 PM
            Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 06:27 PM
              Interesting response... (NT)Rayla2021/07/10 09:02 PM
                perhaps just another lousy AI bot? (NT)anonymou52021/07/10 09:33 PM
                  perhaps just another lousy AI bot?dmcq2021/07/10 11:26 PM
                    perhaps just another lousy AI bot?cqwrteur2021/07/10 11:56 PM
                      perhaps just another lousy AI bot?dmcq2021/07/11 03:29 AM
                      perhaps just another lousy AI bot?anon2021/07/11 06:16 AM
                        perhaps just another lousy AI bot?cqwrteur2021/07/12 03:56 PM
                    perhaps just another lousy AI bot?Rayla2021/07/11 06:13 AM
                      perhaps just another lousy AI bot?cqwrteur2021/07/11 11:59 AM
                        When did I call you a bot, Kebabbert? (NT)Rayla2021/07/11 08:51 PM
              Alternatives?Brendan2021/07/11 01:54 AM
                Alternatives?Michael S2021/07/11 06:01 AM
                  Alternatives?Brendan2021/07/11 06:51 AM
                    Alternatives?cqwrteur2021/07/11 11:58 AM
                      Alternatives?Gabriele Svelto2021/07/12 01:31 AM
                        Alternatives?Michael S2021/07/12 03:58 AM
                          Alternatives?anon22021/07/12 09:08 AM
                            Alternatives?Michael S2021/07/12 09:22 AM
                              cqwrteur: Keep it politeDavid Kanter2021/07/13 08:59 AM
                          Alternatives?dmcq2021/07/12 09:37 AM
                            Alternatives?cqwrteur2021/07/12 04:04 PM
                              Alternatives?dmcq2021/07/12 04:26 PM
                                Alternatives?cqwrteur2021/07/13 01:47 AM
                                  Alternatives?dmcq2021/07/13 06:54 AM
                          Alternatives?Jörn Engel2021/07/13 04:53 PM
                            Alternatives?FrankHB2021/07/17 07:56 AM
                          Differences between Rust and C/GoGabriele Svelto2021/07/14 05:57 AM
                            Differences between Rust and C/GoFrankHB2021/07/17 09:47 AM
                        Alternatives?FrankHB2021/07/12 10:08 AM
                          Alternatives?Gabriele Svelto2021/07/14 02:28 PM
                            Inappropriate messages removed: cqwrteurDavid Kanter2021/07/15 10:59 AM
                            Alternatives?FrankHB2021/07/16 06:43 AM
                              Alternatives?Anon2021/07/16 12:01 PM
                                Alternatives?Gabriele Svelto2021/07/16 01:44 PM
                                Type abstraction and kernel programmingFrankHB2021/07/17 01:44 AM
                                  Type abstraction and kernel programmingdmcq2021/07/18 04:00 AM
                                    Type abstraction and kernel programmingdmcq2021/07/18 04:36 AM
                                  Type abstraction and kernel programmingEtienne Lorrain2021/07/19 01:03 AM
                                    Type abstraction and kernel programmingdmcq2021/07/19 02:01 AM
                                      Type abstraction and kernel programmingAnon2021/07/19 02:05 AM
                                        Type abstraction and kernel programmingdmcq2021/07/19 03:23 AM
                                      Type abstraction and kernel programmingBrendan2021/07/19 07:05 AM
                                Alternatives?gallier22021/07/20 04:57 AM
                                  Alternatives?Anon2021/07/20 06:24 AM
                                    Alternatives?Michael S2021/07/20 10:14 AM
                                      Alternatives?Anon2021/07/20 10:53 AM
                                        Alternatives?gallier22021/07/21 11:44 PM
                                      Alternatives?Adrian2021/07/20 12:00 PM
                                        Alternatives?Brett2021/07/20 11:13 PM
                                          Alternatives?Michael S2021/07/21 02:12 AM
                                            Alternatives?dmcq2021/07/22 12:58 PM
                                          Alternatives?Anon2021/07/21 08:58 AM
                      Alternatives?Brendan2021/07/12 02:34 AM
                        Alternatives?FrankHB2021/07/12 10:57 AM
                          Alternatives?cqwrteur2021/07/12 12:55 PM
                            Alternatives?FrankHB2021/07/12 09:44 PM
                          Alternatives?Brendan2021/07/12 08:52 PM
                            Alternatives?cqwrteur2021/07/12 11:05 PM
                              Alternatives?Anon2021/07/12 11:42 PM
                                Alternatives?cqwrteur2021/07/13 12:42 AM
                                Alternatives?cqwrteur2021/07/13 12:44 AM
                                  Alternatives?Anon2021/07/13 08:32 PM
                                    Alternatives?cqwrteur2021/07/13 09:36 PM
                                    Alternatives?cqwrteur2021/07/13 09:39 PM
                                      Alternatives?Anon2021/07/13 10:02 PM
                                        Alternatives?cqwrteur2021/07/13 10:18 PM
                                    Alternatives?cqwrteur2021/07/13 09:49 PM
                                      Alternatives?Anon2021/07/13 10:07 PM
                                        Alternatives?cqwrteur2021/07/13 10:16 PM
                                          Alternatives?Anon2021/07/13 11:31 PM
                                            Alternatives?cqwrteur2021/07/14 12:30 AM
                                              Alternatives?Anon2021/07/14 01:55 AM
                                                Alternatives?cqwrteur2021/07/14 02:22 AM
                                                  Alternatives?Anon2021/07/14 03:05 AM
                                                    Alternatives?cqwrteur2021/07/14 03:11 AM
                                                      Alternatives?Anon2021/07/14 04:16 AM
                                                        Alternatives?cqwrteur2021/07/14 07:06 AM
                                                          Alternatives?Anon2021/07/14 08:20 AM
                                                            Alternatives?cqwrteur2021/07/14 08:51 AM
                                                              Alternatives?Anon2021/07/14 12:33 PM
                                                              Alternatives?Gabriele Svelto2021/07/14 01:19 PM
                                                                Alternatives?FrankHB2021/07/16 07:07 AM
                                            Alternatives?cqwrteur2021/07/14 12:33 AM
                                              Alternatives?Anon2021/07/14 01:57 AM
                                                Alternatives?cqwrteur2021/07/14 02:21 AM
                                                  Alternatives?dmcq2021/07/14 03:06 AM
                                                    Alternatives?cqwrteur2021/07/14 03:50 AM
                                                  Alternatives?2021/07/15 08:33 AM
                                                    Alternatives?FrankHB2021/07/16 07:13 AM
                                            Alternatives?cqwrteur2021/07/14 12:39 AM
                                              Alternatives?Anon2021/07/14 02:08 AM
                                                Alternatives?cqwrteur2021/07/14 02:20 AM
                                                  Alternatives?dmcq2021/07/14 02:46 AM
                                                    Alternatives?cqwrteur2021/07/14 02:52 AM
                                                      Alternatives?dmcq2021/07/14 10:13 AM
                                                        Alternatives?dmcq2021/07/14 10:23 AM
                                                        Dealing with memory errorsBrendan2021/07/14 12:50 PM
                                                          Dealing with memory errorsdmcq2021/07/14 04:27 PM
                                                            Dealing with memory errorsBrendan2021/07/14 04:55 PM
                                                    Alternatives?cqwrteur2021/07/14 03:12 AM
                                                      Alternatives?Anon2021/07/14 04:16 AM
                                                        Alternatives?cqwrteur2021/07/14 06:55 AM
                                                      Alternatives?FrankHB2021/07/16 07:27 AM
                                                Alternatives?cqwrteur2021/07/14 02:38 AM
                                                  Alternatives?anon2021/07/14 03:50 AM
                                                    Stop feeding that trollnone2021/07/14 04:13 AM
                                                    Alternatives?cqwrteur2021/07/14 07:39 AM
                                                      Alternatives?Brendan2021/07/14 12:15 PM
                                                  Alternatives?Anon2021/07/14 04:19 AM
                                                    Alternatives?cqwrteur2021/07/14 07:12 AM
                                                      Alternatives?Anon2021/07/14 08:17 AM
                                                        Alternatives?cqwrteur2021/07/14 08:47 AM
                                                          Alternatives?Anon2021/07/14 01:00 PM
                                                            Alternatives?cqwrteur2021/07/14 01:44 PM
                                                          Alternatives?2021/07/15 10:36 AM
                                                  Alternatives?Gabriele Svelto2021/07/14 01:26 PM
                                                    Alternatives?cqwrteur2021/07/14 01:46 PM
                                                      Alternatives?Gabriele Svelto2021/07/14 02:36 PM
                                                        Alternatives?cqwrteur2021/07/14 02:55 PM
                                                          Alternatives?Smoochie2021/07/15 12:07 AM
                                                  Alternatives?2021/07/15 08:37 AM
                                                    Alternatives?Brendan2021/07/15 11:21 AM
                                                      Alternatives?Anon2021/07/15 01:15 PM
                                                  Alternatives?FrankHB2021/07/16 07:27 AM
                                          Alternatives?None2021/07/14 02:50 AM
                                            Alternatives?cqwrteur2021/07/14 02:54 AM
                                            Alternatives?cqwrteur2021/07/14 02:55 AM
                                              Alternatives?Rayla2021/07/14 05:47 AM
                                                Alternatives?cqwrteur2021/07/14 06:54 AM
                                              Alternatives?Gabriele Svelto2021/07/14 01:43 PM
                                Alternatives?FrankHB2021/07/13 12:47 AM
                            Alternatives?FrankHB2021/07/13 12:05 AM
                              Alternatives?Michael S2021/07/13 01:01 AM
                                Alternatives?FrankHB2021/07/13 01:25 AM
                            Alternatives?Doug S2021/07/13 12:29 AM
                              Alternatives?cqwrteur2021/07/13 12:48 AM
                              Alternatives?FrankHB2021/07/13 01:07 AM
              Is unsafe hell truly good for linux kernel in the future?2021/07/12 06:27 AM
                Is unsafe hell truly good for linux kernel in the future?Anon2021/07/12 09:46 AM
                Is unsafe hell truly good for linux kernel in the future?Etienne Lorrain2021/07/13 02:00 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 01:38 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?