memory errors

By: dmcq (dmcq.delete@this.fano.co.uk), March 4, 2021 7:40 am
Room: Moderated Discussions
Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on March 4, 2021 6:26 am wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on March 4, 2021 5:16 am wrote:
> > ...
> >
> > I think the important thing is error detection - not recovery. Error recovery at a low level is nice to
> > have but if the whole business can be fixed at a higher level and the error rate is low enough it is not
> > really necessary. Intel leaving out ECC was dreadful, the
> > thing that I think was really criminal and cretinous
> > though was cutting out even parity checking. I see it as a cheap trick to obscure errors so people just
> > blamed gremlins and pressed ctrl-alt-delete rather than fixing underlyng problems. Of course some memory
> > problems would escape that but it would catch memory that is failing and it would give an indication of
> > how reliable it is overall.
>
> Historically, I think ECC error detection was removed approximately at the time it took too much time to
> initialise the memory. At power-up, the parity bit is not initialised: if you dump the DDR before initialisation
> you get mostly zero bits but you will also get bits set (I do not know why the capacitor is still charged
> at power-up). If you do a quick power-cycle, it is obvious you will still have bits set.
> When the memory of the PC increased to few tens of megabytes, the CPU (at that time) was
> not able to clear that DRAM (so initialise the ECC) in less than 10 seconds, and the PC never
> had a powerful DMA to do such work. To cut boot time, they removed the parity bit.
>
> The problem of when to initialise ECC is still there, and on some embedded system I worked on,
> I was intentionally setting ECC errors on every ECC lines at boot to be sure the O.S. (when present,
> or the bootloader) do not use directly uninitialised memory. Obviously you can only do that if
> you take control of the CPU just after reset, any "secure boot" stuff will not help.
>
> You need to ensure you initialise the ECC correctly, one usual problem is if
> you write less than an ECC line, you may get an ECC error at write time.
>
> That is why IHMO the O.S. should manage itself any ECC (non recoverable) error, ignore it if it
> is new memory allocated to a process, correct it if it comes from a file-backed page, and stop
> the owning process(es) if necessary (i.e. cannot correct). And log the address of the error.
> Having "secure boot" and virtual boxes do not help, but giving only
> invalid ECC memory block would probably also detect bugs there.

Sounds interesting using ECC to detect uninitialised memory.

The initialisation problem could have been easily solved by simply only initialising a page when it was allocated, or ovcerwriting it if it is set from disk. You'd still get some seconds spent on the job but the user wouldn't notice. I can't really believe they wouldn't think of that never mind that booting took a long time anyway so this would be no sort of great gain.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
CPU & Memory bit flipsGanon2021/03/03 10:05 AM
  Also "Silent Data Corruption"Adrian2021/03/03 11:42 AM
    Thanks for the referenceGanon2021/03/03 12:47 PM
  Implications for linux page cacheanon2021/03/03 12:54 PM
    Implications for linux page cacheLinus Torvalds2021/03/03 02:54 PM
      memory errorsblaine2021/03/03 03:53 PM
        memory errorsanon22021/03/03 06:30 PM
          memory errorsdmcq2021/03/04 06:16 AM
            memory errorsEtienne Lorrain2021/03/04 07:26 AM
              memory errorsdmcq2021/03/04 07:40 AM
                memory errorsEtienne Lorrain2021/03/04 07:58 AM
                  memory errorsdmcq2021/03/04 08:12 AM
                  memory errorsCarson2021/03/05 03:31 AM
                    memory errorsEtienne Lorrain2021/03/05 07:23 AM
                      memory errorsrwessel2021/03/05 08:48 AM
                      memory errorsdmcq2021/03/05 01:01 PM
                        memory errorsrwessel2021/03/05 01:23 PM
                          memory errorsdmcq2021/03/05 01:51 PM
                      memory errorsBrendan2021/03/06 12:38 AM
                      memory errorsCarson2021/03/06 02:35 AM
                        memory errorsCarson2021/03/06 07:24 AM
                memory errorsDavid Hess2021/03/04 02:44 PM
                  memory errorsrwessel2021/03/04 06:14 PM
                  memory errorsLinus Torvalds2021/03/04 09:21 PM
                    memory errorsanon22021/03/04 10:46 PM
                      memory errorsCarson2021/03/05 03:43 AM
                        memory errorsanon22021/03/05 08:55 AM
                    memory errorsgallier22021/03/05 03:22 AM
                  memory errorsdmcq2021/03/05 01:59 PM
                    memory errorsDavid Hess2021/03/06 05:27 AM
                      memory errorsCarson2021/03/06 07:44 AM
                      memory errorsGabriele Svelto2021/03/06 11:11 AM
                        memory errorsDavid Hess2021/03/06 11:28 AM
                          memory errorsMichael S2021/03/06 03:45 PM
              memory errorsDoug S2021/03/04 11:48 AM
                memory errorsMichael S2021/03/04 12:36 PM
              memory errorsJörn Engel2021/03/04 04:32 PM
                memory errorsLinus Torvalds2021/03/04 09:47 PM
                  memory errorsEtienne Lorrain2021/03/05 02:09 AM
                  memory errorsMichael S2021/03/05 05:06 AM
                    memory errorsLinus Torvalds2021/03/05 12:59 PM
                      memory errorsrwessel2021/03/05 01:32 PM
                        memory errorsrwessel2021/03/05 01:37 PM
                        memory errorszArchJon2021/03/06 09:39 PM
                      memory errorsGabriele Svelto2021/03/06 01:58 PM
                  memory errorsJörn Engel2021/03/05 11:12 AM
                Amiga recoverable RAM disk?Carson2021/03/05 04:03 AM
                  Thanks - TIL a cool Amiga feature (nt) (NT)John2021/03/05 01:51 PM
                    Another cool Amiga feature, datatypesCharles2021/03/06 01:01 AM
                      Another cool Amiga feature, datatypesJukka Larja2021/03/06 02:23 AM
                      Another cool Amiga feature, datatypesAnon2021/03/06 01:40 PM
                      Another cool Amiga feature, filesystemsMarcus2021/03/07 01:28 AM
  CPU & Memory bit flipszArchJon2021/03/04 07:39 AM
    CPU & Memory bit flipsdmcq2021/03/04 07:59 AM
      CPU & Memory bit flipsrwessel2021/03/04 01:27 PM
  speak of the devilRobert Williams2021/03/05 08:53 AM
    speak of the devildmcq2021/03/05 12:26 PM
      speak of the devilRobert Williams2021/03/05 04:15 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?