Dealing with memory errors

By: Brendan (btrotter.delete@this.gmail.com), July 14, 2021 3:55 pm
Room: Moderated Discussions
Hi,

dmcq (dmcq.delete@this.fano.co.uk) on July 14, 2021 4:27 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on July 14, 2021 12:50 pm wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 14, 2021 10:13 am wrote:
> > > Actually I did write an operating system myself from scratch complete with multiple virtual
> > > machines about thirty years ago. The most difficult error handling was dealing with memory
> > > errors as we expected a couple a week to occur in the various boards using it.
> >
> > I apologize for completely changing the topic; but I've been researching/thinking about/"theorizing
> > about" memory error tolerance (on and off) for about 10 years now - mostly revolving around
> > the obvious "software ECC on top of paging" approach (as described in papers like this one:
> > https://repository.lib.ncsu.edu/bitstream/handle/1840.20/33537/TR-2016-7.pdf ).
> >
> > Are there any interesting insights and/or details you'd be willing to share regarding your experiences?
> >
> > - Brendan
>
> Nothing particularly wonderful. The memory had parity but not hamming - the problem would have been much less
> worrying with hamming but that's what saving money does to you. Tasks for the board could be discarded and restarted
> easily if it said there was an error which made things easier. On the other hand if it didn't recover in a halfway
> decent manner from a failure the host would be unable to restart the board after timing it out.
>
> As to the handling, chunks of read only areas were covered by longitutinal sums and were checked incrementally
> in the watchdog timer and fixed if there there was only one error per chunk. Errors could also be
> detected whilst running or in IO. Read only errors would be fixed. If an error occurred in an unused
> area including parts of buffers which weren't full the error was just fixed. Otherwise errors in writable
> areas which could be ascribed to a particular VM caused an error to be returned for the task, otherwise
> the board said there was a general error and everything was restarted. The handler for the errors
> had a little bit duplicated and provided one got past an initial little part an error in either half
> would be handled by going down the other path. Stuck errors were handled using virtual memory. Details
> would be returned about where errors happened rather like a disk.

Thanks - it definitely sounds like an interesting piece of engineering (and made me think more about providing feedback higher levels could use to further manage uncorrected errors). :-)

- Brendan
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/09 08:56 PM
  Is unsafe hell truly good for linux kernel in the future?Brendan2021/07/09 11:59 PM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 12:37 PM
  Is unsafe hell truly good for linux kernel in the future?anon2021/07/10 03:14 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 12:40 PM
      Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 02:59 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 03:42 PM
      Is unsafe hell truly good for linux kernel in the future?anon2021/07/11 05:11 AM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/12 11:40 AM
  Is unsafe hell truly good for linux kernel in the future?Foo_2021/07/10 05:56 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 08:59 AM
      Most RWT posters don’t decide what goes into the Linux kernelMark Roulo2021/07/10 11:55 AM
      Is unsafe hell truly good for linux kernel in the future?Foo_2021/07/22 10:10 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 09:22 AM
      Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 09:24 AM
        Déja VuDismissive2021/07/10 09:41 AM
          Déja Vucqwrteur2021/07/10 09:47 AM
            Déja VuDismissive2021/07/10 09:51 AM
            Déja VuMichael S2021/07/10 12:11 PM
  Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 11:51 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 12:32 PM
      Is unsafe hell truly good for linux kernel in the future?Michael S2021/07/10 01:04 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 01:25 PM
      Is unsafe hell truly good for linux kernel in the future?Gabriele Svelto2021/07/10 02:56 PM
        Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 03:41 PM
          Is unsafe hell truly good for linux kernel in the future?Rayla2021/07/10 04:33 PM
            Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 05:27 PM
              Interesting response... (NT)Rayla2021/07/10 08:02 PM
                perhaps just another lousy AI bot? (NT)anonymou52021/07/10 08:33 PM
                  perhaps just another lousy AI bot?dmcq2021/07/10 10:26 PM
                    perhaps just another lousy AI bot?cqwrteur2021/07/10 10:56 PM
                      perhaps just another lousy AI bot?dmcq2021/07/11 02:29 AM
                      perhaps just another lousy AI bot?anon2021/07/11 05:16 AM
                        perhaps just another lousy AI bot?cqwrteur2021/07/12 02:56 PM
                    perhaps just another lousy AI bot?Rayla2021/07/11 05:13 AM
                      perhaps just another lousy AI bot?cqwrteur2021/07/11 10:59 AM
                        When did I call you a bot, Kebabbert? (NT)Rayla2021/07/11 07:51 PM
              Alternatives?Brendan2021/07/11 12:54 AM
                Alternatives?Michael S2021/07/11 05:01 AM
                  Alternatives?Brendan2021/07/11 05:51 AM
                    Alternatives?cqwrteur2021/07/11 10:58 AM
                      Alternatives?Gabriele Svelto2021/07/12 12:31 AM
                        Alternatives?Michael S2021/07/12 02:58 AM
                          Alternatives?anon22021/07/12 08:08 AM
                            Alternatives?Michael S2021/07/12 08:22 AM
                              cqwrteur: Keep it politeDavid Kanter2021/07/13 07:59 AM
                          Alternatives?dmcq2021/07/12 08:37 AM
                            Alternatives?cqwrteur2021/07/12 03:04 PM
                              Alternatives?dmcq2021/07/12 03:26 PM
                                Alternatives?cqwrteur2021/07/13 12:47 AM
                                  Alternatives?dmcq2021/07/13 05:54 AM
                          Alternatives?Jörn Engel2021/07/13 03:53 PM
                            Alternatives?FrankHB2021/07/17 06:56 AM
                          Differences between Rust and C/GoGabriele Svelto2021/07/14 04:57 AM
                            Differences between Rust and C/GoFrankHB2021/07/17 08:47 AM
                        Alternatives?FrankHB2021/07/12 09:08 AM
                          Alternatives?Gabriele Svelto2021/07/14 01:28 PM
                            Inappropriate messages removed: cqwrteurDavid Kanter2021/07/15 09:59 AM
                            Alternatives?FrankHB2021/07/16 05:43 AM
                              Alternatives?Anon2021/07/16 11:01 AM
                                Alternatives?Gabriele Svelto2021/07/16 12:44 PM
                                Type abstraction and kernel programmingFrankHB2021/07/17 12:44 AM
                                  Type abstraction and kernel programmingdmcq2021/07/18 03:00 AM
                                    Type abstraction and kernel programmingdmcq2021/07/18 03:36 AM
                                  Type abstraction and kernel programmingEtienne Lorrain2021/07/19 12:03 AM
                                    Type abstraction and kernel programmingdmcq2021/07/19 01:01 AM
                                      Type abstraction and kernel programmingAnon2021/07/19 01:05 AM
                                        Type abstraction and kernel programmingdmcq2021/07/19 02:23 AM
                                      Type abstraction and kernel programmingBrendan2021/07/19 06:05 AM
                                Alternatives?gallier22021/07/20 03:57 AM
                                  Alternatives?Anon2021/07/20 05:24 AM
                                    Alternatives?Michael S2021/07/20 09:14 AM
                                      Alternatives?Anon2021/07/20 09:53 AM
                                        Alternatives?gallier22021/07/21 10:44 PM
                                      Alternatives?Adrian2021/07/20 11:00 AM
                                        Alternatives?Brett2021/07/20 10:13 PM
                                          Alternatives?Michael S2021/07/21 01:12 AM
                                            Alternatives?dmcq2021/07/22 11:58 AM
                                          Alternatives?Anon2021/07/21 07:58 AM
                      Alternatives?Brendan2021/07/12 01:34 AM
                        Alternatives?FrankHB2021/07/12 09:57 AM
                          Alternatives?cqwrteur2021/07/12 11:55 AM
                            Alternatives?FrankHB2021/07/12 08:44 PM
                          Alternatives?Brendan2021/07/12 07:52 PM
                            Alternatives?cqwrteur2021/07/12 10:05 PM
                              Alternatives?Anon2021/07/12 10:42 PM
                                Alternatives?cqwrteur2021/07/12 11:42 PM
                                Alternatives?cqwrteur2021/07/12 11:44 PM
                                  Alternatives?Anon2021/07/13 07:32 PM
                                    Alternatives?cqwrteur2021/07/13 08:36 PM
                                    Alternatives?cqwrteur2021/07/13 08:39 PM
                                      Alternatives?Anon2021/07/13 09:02 PM
                                        Alternatives?cqwrteur2021/07/13 09:18 PM
                                    Alternatives?cqwrteur2021/07/13 08:49 PM
                                      Alternatives?Anon2021/07/13 09:07 PM
                                        Alternatives?cqwrteur2021/07/13 09:16 PM
                                          Alternatives?Anon2021/07/13 10:31 PM
                                            Alternatives?cqwrteur2021/07/13 11:30 PM
                                              Alternatives?Anon2021/07/14 12:55 AM
                                                Alternatives?cqwrteur2021/07/14 01:22 AM
                                                  Alternatives?Anon2021/07/14 02:05 AM
                                                    Alternatives?cqwrteur2021/07/14 02:11 AM
                                                      Alternatives?Anon2021/07/14 03:16 AM
                                                        Alternatives?cqwrteur2021/07/14 06:06 AM
                                                          Alternatives?Anon2021/07/14 07:20 AM
                                                            Alternatives?cqwrteur2021/07/14 07:51 AM
                                                              Alternatives?Anon2021/07/14 11:33 AM
                                                              Alternatives?Gabriele Svelto2021/07/14 12:19 PM
                                                                Alternatives?FrankHB2021/07/16 06:07 AM
                                            Alternatives?cqwrteur2021/07/13 11:33 PM
                                              Alternatives?Anon2021/07/14 12:57 AM
                                                Alternatives?cqwrteur2021/07/14 01:21 AM
                                                  Alternatives?dmcq2021/07/14 02:06 AM
                                                    Alternatives?cqwrteur2021/07/14 02:50 AM
                                                  Alternatives?2021/07/15 07:33 AM
                                                    Alternatives?FrankHB2021/07/16 06:13 AM
                                            Alternatives?cqwrteur2021/07/13 11:39 PM
                                              Alternatives?Anon2021/07/14 01:08 AM
                                                Alternatives?cqwrteur2021/07/14 01:20 AM
                                                  Alternatives?dmcq2021/07/14 01:46 AM
                                                    Alternatives?cqwrteur2021/07/14 01:52 AM
                                                      Alternatives?dmcq2021/07/14 09:13 AM
                                                        Alternatives?dmcq2021/07/14 09:23 AM
                                                        Dealing with memory errorsBrendan2021/07/14 11:50 AM
                                                          Dealing with memory errorsdmcq2021/07/14 03:27 PM
                                                            Dealing with memory errorsBrendan2021/07/14 03:55 PM
                                                    Alternatives?cqwrteur2021/07/14 02:12 AM
                                                      Alternatives?Anon2021/07/14 03:16 AM
                                                        Alternatives?cqwrteur2021/07/14 05:55 AM
                                                      Alternatives?FrankHB2021/07/16 06:27 AM
                                                Alternatives?cqwrteur2021/07/14 01:38 AM
                                                  Alternatives?anon2021/07/14 02:50 AM
                                                    Stop feeding that trollnone2021/07/14 03:13 AM
                                                    Alternatives?cqwrteur2021/07/14 06:39 AM
                                                      Alternatives?Brendan2021/07/14 11:15 AM
                                                  Alternatives?Anon2021/07/14 03:19 AM
                                                    Alternatives?cqwrteur2021/07/14 06:12 AM
                                                      Alternatives?Anon2021/07/14 07:17 AM
                                                        Alternatives?cqwrteur2021/07/14 07:47 AM
                                                          Alternatives?Anon2021/07/14 12:00 PM
                                                            Alternatives?cqwrteur2021/07/14 12:44 PM
                                                          Alternatives?2021/07/15 09:36 AM
                                                  Alternatives?Gabriele Svelto2021/07/14 12:26 PM
                                                    Alternatives?cqwrteur2021/07/14 12:46 PM
                                                      Alternatives?Gabriele Svelto2021/07/14 01:36 PM
                                                        Alternatives?cqwrteur2021/07/14 01:55 PM
                                                          Alternatives?Smoochie2021/07/14 11:07 PM
                                                  Alternatives?2021/07/15 07:37 AM
                                                    Alternatives?Brendan2021/07/15 10:21 AM
                                                      Alternatives?Anon2021/07/15 12:15 PM
                                                  Alternatives?FrankHB2021/07/16 06:27 AM
                                          Alternatives?None2021/07/14 01:50 AM
                                            Alternatives?cqwrteur2021/07/14 01:54 AM
                                            Alternatives?cqwrteur2021/07/14 01:55 AM
                                              Alternatives?Rayla2021/07/14 04:47 AM
                                                Alternatives?cqwrteur2021/07/14 05:54 AM
                                              Alternatives?Gabriele Svelto2021/07/14 12:43 PM
                                Alternatives?FrankHB2021/07/12 11:47 PM
                            Alternatives?FrankHB2021/07/12 11:05 PM
                              Alternatives?Michael S2021/07/13 12:01 AM
                                Alternatives?FrankHB2021/07/13 12:25 AM
                            Alternatives?Doug S2021/07/12 11:29 PM
                              Alternatives?cqwrteur2021/07/12 11:48 PM
                              Alternatives?FrankHB2021/07/13 12:07 AM
              Is unsafe hell truly good for linux kernel in the future?2021/07/12 05:27 AM
                Is unsafe hell truly good for linux kernel in the future?Anon2021/07/12 08:46 AM
                Is unsafe hell truly good for linux kernel in the future?Etienne Lorrain2021/07/13 01:00 AM
    Is unsafe hell truly good for linux kernel in the future?cqwrteur2021/07/10 12:38 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?