In-band ECC support in recent Atom SoCs & Tiger Lake U

By: Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr), December 17, 2020 2:15 am
Room: Moderated Discussions
JS (none.delete@this.null.com) on December 17, 2020 12:39 am wrote:
> Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on December 16, 2020 10:00 pm wrote:
> > JS (none.delete@this.null.com) on December 16, 2020 9:07 pm wrote:
> > > If you're caching your ECC AND keeping it in a separate location, your not really keeping in
> > > the spirit of ECC. Why not sacrifice half your memory (and bandwidth I suppose) store your EEC
> > > next to the data, in whatever granularity is best, cacheline/byte/word/etc... Now you have multi-bit
> > > ECC and probably better error correction than most hardware solutions on the market. It's not
> > > like it's all that hard these days getting a lot memory in small form factors.
> >
> > The purpose of using in-band ECC is to avoid having to add an extra chip for the ECC
> > bit, using half your memory to store them kind of defeats the purpose, doesn't it?
>
> In retrospect it is a pretty stupid idea. It'd cost less to just buy ECC memory.

ECC memory is not SECDED memory, the former is just a parity bit which can detect a single bit error and will consistently ignore two bit errors.

Single Error Correct Double Error Detect does a lot more, but need more bits and need to define if each bytes is protected independently or if each word is protected independently (with kludges when single byte is accessed, when the bit flipped is not in the byte accessed).

It even seem that bit errors appear frequently "in groups", so internal processor caches sometimes implement DECTED "Double Error Correct Triple Error Detect".

The main problem is in processors and their EFI (BIOS) which do not provide a correct interface to the OS to handle properly: an ECC error can (and should) be ignored if the memory has never been used and its content doesn't matter (in the middle of a byte per byte memset(), you do not want to care if the next byte has incorrect ECC), or be manually corrected if the data can easily be corrected (ECC error appear in the page cache in a write protected page backed by the filesystem) - then just re-read that page from the disk.
If the ECC error appears in a task you just want to kill that task, not reboot the whole PC.

Basically ECC errors often appear in memory which has not been accessed for a long time, which is exactly the case of unused memory or page cache.

Moreover the processor I/O or EFI BIOS should have an interface to correct a permanent bit error inside a DDR4 DRAM, by disabling that memory line and replacing with a spare, instead of constantly relying on SECDED to correct each access.

Some solutions could put the complete SECDED/DECTED on the memory DIMM, but it is still nice to count them and report to the user (DRAM is failing - to replace soon, delay in some access)(report with the address failing to know which DIMM to replace, or if always the same address fails).

The length of the ECC line is also critical, encoding each bytes with their own SECDED is possible but adds 4 bits every 8 bits (4 bits allows to index which bit has flipped, from 0 to 11), on a 32 bits words it adds less bits, on 64 bits words even less, on a complete cacheline it would add even less - but for each increase you need to handle correctly each sub-access like only a byte is read (and the bit flipped is not in that byte, shall we correct silently the erroneous bit?).
Also you may want to be able to correct more than a single bit when the ECC line length increase, for instance a cacheline, the logic involved would be very big and difficult to test (in factory for each chip).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
In-band ECC support in recent Atom SoCsGabriele Svelto2020/12/15 03:24 PM
  In-band ECC support in recent Atom SoCsanon2020/12/15 05:40 PM
  In-band ECC support in recent Atom SoCsanon32020/12/15 07:07 PM
  In-band ECC support in recent Atom SoCsEtienne Lorrain2020/12/16 01:48 AM
    In-band ECC support in recent Atom SoCsAdrian2020/12/16 07:43 AM
      ECC in SoCsKonrad Schwarz2020/12/17 07:37 AM
        ECC in SoCsAdrian2020/12/17 08:43 AM
          ECC in SoCsMichael S2020/12/17 12:06 PM
  In-band ECC support in recent Atom SoCs & Tiger Lake UAdrian2020/12/16 07:31 AM
    In-band ECC support in recent Atom SoCs & Tiger Lake UJS2020/12/16 09:07 PM
      In-band ECC support in recent Atom SoCs & Tiger Lake UGabriele Svelto2020/12/16 10:00 PM
        In-band ECC support in recent Atom SoCs & Tiger Lake UJS2020/12/17 12:39 AM
          In-band ECC support in recent Atom SoCs & Tiger Lake UEtienne Lorrain2020/12/17 02:15 AM
            In-band ECC support in recent Atom SoCs & Tiger Lake UJames2020/12/17 07:28 AM
              In-band ECC support in recent Atom SoCs & Tiger Lake UEtienne Lorrain2020/12/17 09:16 AM
                In-band ECC support in recent Atom SoCs & Tiger Lake Urwessel2020/12/17 09:51 AM
                  In-band ECC support in recent Atom SoCs & Tiger Lake UMichael S2020/12/17 12:22 PM
    Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/16 10:09 PM
      Enough with the idiocy ... let's have proper ECC again.Maxwell2020/12/17 12:58 AM
        Enough with the idiocy ... let's have proper ECC again.pixiespeed2020/12/17 09:04 AM
      Enough with the idiocy ... let's have proper ECC again.Adrian2020/12/17 07:40 AM
      Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:09 AM
        Enough with the idiocy ... let's have proper ECC again.Etienne Lorrain2020/12/17 09:26 AM
          Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:42 AM
            Enough with the idiocy ... let's have proper ECC again.David Kanter2020/12/17 12:04 PM
              Enough with the idiocy ... let's have proper ECC again.Doug S2020/12/17 01:03 PM
              Enough with the idiocy ... let's have proper ECC again.phonon2020/12/17 03:25 PM
                Internal array ECC vs. memory controllerDavid Kanter2020/12/19 10:39 AM
                  Internal array ECC vs. memory controllerJörn Engel2020/12/20 10:42 AM
                    Internal array ECC vs. memory controllerrwessel2020/12/20 10:52 AM
                    Internal array ECC vs. memory controllerDavid Kanter2020/12/20 03:44 PM
              Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 04:55 PM
                Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/17 07:34 PM
                  Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:10 PM
                    Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:43 PM
                      Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/18 09:30 AM
                        Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 01:00 AM
                          Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/19 10:23 AM
                            Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 03:01 PM
                              Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/19 04:23 PM
                                Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 04:30 PM
              Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/17 07:41 PM
                Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 08:48 PM
              Enough with the idiocy ... let's have proper ECC again.Memory Guy2020/12/17 09:19 PM
      Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/17 10:01 AM
      Enough with the idiocy ... let's have proper ECC again.Wes Felter2020/12/18 09:38 PM
        Thanks for the confirmation!David Kanter2020/12/19 11:51 AM
          Thanks for the confirmation!Konrad Schwarz2020/12/20 09:34 AM
            Thanks for the confirmation!Niels Jørgen Kruse2020/12/20 11:01 AM
              Thanks for the confirmation!David Kanter2020/12/20 03:45 PM
              Thanks for the confirmation!Gionatan Danti2020/12/21 12:50 AM
                Thanks for the confirmation!Niels Jørgen Kruse2020/12/21 09:07 AM
            Thanks for the confirmation!David Kanter2020/12/20 03:42 PM
              Thanks for the confirmation!Foo_2020/12/21 02:01 AM
                Thanks for the confirmation!David Kanter2020/12/21 08:39 AM
            Thanks for the confirmation!Paul2020/12/20 11:29 PM
              Thanks for the confirmation!Michael S2020/12/21 01:00 AM
                Thanks for the confirmation!anon20202020/12/21 01:44 AM
                Thanks for the confirmation!Paul2020/12/22 12:42 PM
                  Thanks for the confirmation!Michael S2020/12/22 02:28 PM
                    Thanks for the confirmation!Paul2020/12/22 06:12 PM
                      Thanks for the confirmation!Michael S2020/12/23 02:55 PM
                        Thanks for the confirmation!Paul2020/12/23 03:54 PM
                          Thanks for the confirmation!Dan Fay2020/12/23 04:38 PM
                            Thanks for the confirmation!Paul2020/12/26 04:10 AM
                              Thanks for the confirmation!Björn Ragnar Björnsson2020/12/26 08:37 PM
                                Thanks for the confirmation!anon22020/12/27 02:00 AM
                                Thanks for the confirmation!Doug S2020/12/28 12:47 PM
            Thanks for the confirmation!David Hess2020/12/21 06:35 PM
              Thanks for the confirmation!Konrad Schwarz2020/12/22 12:08 AM
                Thanks for the confirmation!Doug S2020/12/22 10:42 AM
                  Thanks for the confirmation!David Hess2020/12/22 12:32 PM
                Thanks for the confirmation!David Hess2020/12/22 12:21 PM
        Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 04:25 PM
          Enough with the idiocy ... let's have proper ECC again.Brett2020/12/19 08:13 PM
            Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 09:17 PM
              Enough with the idiocy ... let's have proper ECC again.Konrad Schwarz2020/12/21 03:29 AM
                Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/21 06:49 PM
            Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 09:57 PM
              Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 10:14 PM
            Enough with the idiocy ... let's have proper ECC again.Adrian2020/12/20 02:06 AM
              Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/20 08:43 AM
             Multi-level DRAM?Brett2020/12/20 08:07 PM
               Multi-level DRAM?Heikki Kultala2020/12/21 11:58 AM
               Multi-level DRAM?David Hess2020/12/21 07:25 PM
                 Multi-level DRAM?Adrian2020/12/22 05:15 AM
                   Multi-level DRAM?Dan Fay2020/12/22 10:11 AM
                     Multi-level DRAM?Paul2020/12/22 06:01 PM
                       Multi-level DRAM?Dan Fay2020/12/23 12:29 PM
                         Multi-level DRAM?Paul2020/12/23 01:00 PM
                           Multi-level DRAM?Dan Fay2020/12/23 04:30 PM
                             Multi-level DRAM?David Hess2020/12/23 05:05 PM
                           Multi-level DRAM?Björn Ragnar Björnsson2020/12/25 06:44 PM
                             Multi-level DRAM?Paul2020/12/26 04:04 AM
                               Multi-level DRAM?Michael S2020/12/26 08:11 AM
                                 DIMM binsPaul2020/12/26 08:55 AM
                                   DIMM binsBjörn Ragnar Björnsson2020/12/26 08:09 PM
                                     DIMM binsBjörn Ragnar Björnsson2020/12/26 08:19 PM
                                       DIMM binsDaniel Fay2020/12/27 07:51 PM
                                  Is binning at the module or die level? (NT)anonymous22020/12/27 02:36 PM
                                    Is binning at the module or die level?David Hess2020/12/28 01:31 PM
                               Multi-level DRAM?Doug S2020/12/28 12:55 PM
                               Multi-level DRAM?David Hess2020/12/28 01:36 PM
                             Multi-level DRAM?anon­­32020/12/26 10:22 PM
                               Multi-level DRAM?Björn Ragnar Björnsson2020/12/27 07:12 PM
                               Multi-level DRAM?Paul2021/01/04 04:20 AM
               Multi-level DRAM?Carson2021/01/05 12:14 PM
                 Multi-level DRAM?Brett2021/01/05 02:05 PM
        Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 04:35 PM
        Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 08:59 PM
          Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/20 08:56 AM
          Enough with the idiocy ... let's have proper ECC again.Doug S2020/12/20 10:16 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊