Enough with the idiocy ... let's have proper ECC again.

By: Maynard Handley (name99.delete@this.name99.org), December 18, 2020 9:30 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on December 17, 2020 9:43 pm wrote:
> Maynard Handley (name99.delete@this.name99.org) on December 17, 2020 9:10 pm wrote:
> > rwessel (rwessel.delete@this.yahoo.com) on December 17, 2020 7:34 pm wrote:
> > > Maynard Handley (name99.delete@this.name99.org) on December 17, 2020 4:55 pm wrote:
> > >
> > > > As I keep trying to remind you, with plentiful transistors many clever possibilities become available...
> > > > Even with traditional 8bit DRAM at least two options present themselves:
> > > >
> > > > (...)
> > > >
> > > > (b) use memory compression. For example compress each 128-byte line down to ~63bytes + an
> > > > ECC byte, and store than in RAM. Qualcomm implemented this exact scenario on Falkor. Obviously,
> > > > exactly as I've described it this gives you probablistic ECC, some lines covered others
> > > > not. If you're willing to also implement a less aggressive compression you can probably
> > > > fit most of the remaining lines into 127(or 126) bytes + an ECC byte or two.
> > >
> > >
> > > You then need at least one additional bit to indicate that you have a compressed/ECC'd
> > > line, and you cannot compress all possible lines enough to make room for that bit.
> > > And if you need an additional bit per line, you need special memory anyway.
> > >
> > > Nor are 8 bits enough to (SECDEC) ECC more than 64 (and a
> > > few) bits. For a 128 byte line, you'd need at least 12.
> >
> > Of course you need some out of band memory to indicate (1
> > bit per unit, most likely 128 bytes) whether a line
> > is compressed or not. That's all you need out of band, any other indicator bits can be stored in band.
> > This might seem like a hassle but actually it's not conceptually much different
> > from dealing with paging -- you have some structure at a dedicated spot in
> > physical memory, allocate it, point the HW at it, and cache the data.
> >
> > Of course this cannot cover EVERY line. That's why I said probabilistic.
> >
> > The point of cached *RAM* is essentially to boost your bandwidth. But if you're willing to
> > do it (clearly of value for a server, the degree of value for a home machine would depend
> > on how much high-bandwidth data can still be captured by compression -- which probably means
> > how much data used during heavy graphics isn't captured by the various graphics compression
> > technologies) you can then utilize the fact that it's in place for other tasks.
> >
> > One obvious such task is you can maintain the data in compressed
> > form in L3 boosting your effective capacity.
> > Another is that remember in DRAM (unlike in L3) for various reasons it's hard to pack the half-lines;
> > so you save bandwidth but you generally don't get more effective space. But you DO now have 64 bytes
> > of essentially free space, so how can you use that. ECC (used probabilistically) is one answer.
> > This is no longer giving you bandwidth compression, but it is giving you free ECC, as much
> > as you can pack in for that line and maybe for a few successor lines; maybe that's a useful
> > tradeoff to activate whenever your machine is not running at maximum bandwidth?
> >
> > I'm just throwing out ideas here; I've certainly not simulated any of them. Though we do know,
> > as I said, that QC considered DRAM line compression both practical and worth doing for the
> > bandwidth savings, and there've been likewise many papers on L3 line compression.
> > You do need some carved out memory to kickstart the whole business (1 bit per 128bytes, so
> > a byte per K, so 16MB for a 16GB system -- too large to store purely on the SoC, but small
> > enough that it's no loss in the physical DRAM; like I said the analogy is to paging).
> >
> > And the question is: why are you doing this? If your goal is z/ class RAS, of course this is
> > not good enough, no-one would say otherwise. If the goal is to catch DRAM that's slightly flaky,
> > but you don't know that, all you know is that your machine seems to crash randomly or do something
> > strange every few weeks, well, in that respect it is a nice bump in RAS.
> > As for how many bits, well, anything from 1 bit per line (no-one's pretending
> > error correction, all we want is to detect wonky RAM), meaning just one compressed
> > line can cover an entire DRAM page up to SECDED is fine by me.
> >
> > Obviously some readers have different goals, they want z/class RAS and believe (correctly or
> > not) that their SW is reliable enough that worrying about that level of HW makes sense.
> > And of course those readers probably can't do with less than real physical ECC hardware.
> >
> > But for *ME* I have zero illusions about the quality of Apple SW. I just want all the parts
> > of my machine (so this would also include all disks) to report to me that they are starting
> > to fail; I'm so sick of the situation where you only figure out after six months that the
> > latest round of random flakiness was because of a bad USB drive or bad DRAM or whatever.
>
> Oh I forgot to add. One more thing you can do, once you are willing to play these tricks, either
> with out-of-band ECC or via using bytes left over from compressed lines, is you can store hints that
> help the memory and coherency system. Hints -- so not a catastrophe if they aren't present.
> IBM do this for their large systems. I have made no attempt to understand the large system IBM coherence
> protocols (which are now up to at least 13 states) but I know that most of the states beyond the basic
> MESI are for performance rather than correctness AND, more important, at least one bit that conveys
> some information to the coherence machinery is stored in DRAM (using the ECC bits available). I've no
> idea of the details, but you could imagine something like a hint that a line is used exclusively (in
> which case when it's accessed, maybe don't pass it through L3, move it straight to L2) vs if it is usually
> shared (in which case also put a copy in L3). [Of course this assumes the relationship between your
> private cache and the shared cache is not governed by inclusivity/exclusivity rules...]
> With 4 core systems this may not be valuable but as you grow to larger systems...
>
> My larger point is that if you ask "suppose I were willing to make the effort to provide metadata bits
> for each line in DRAM. What interesting things could I do with that?" opportunities do suggest themselves.
> Each individual possibility (probabilistic ECC, higher bandwidth through compressed RAM, slightly more
> efficient sharing and cache usage) may not be compelling in itself, but the entire collection may start
> to make it worth while, especially if there are additional hints one can think of --
> maybe if you know that a line is usually used as write only vs usually read/write vs read only?
> maybe a way to flash zero an entire page by setting the appropriate metadata as a single contiguous write?
>

This twitter thread (go up and down from the link I posted) is interesting insofar as it confirms my point. We don't necessarily demand perfection from our home machines, but it would
be nice to know when their hardware is starting to go bad:
https://twitter.com/elfprince13/status/1337402006609285120
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
In-band ECC support in recent Atom SoCsGabriele Svelto2020/12/15 03:24 PM
  In-band ECC support in recent Atom SoCsanon2020/12/15 05:40 PM
  In-band ECC support in recent Atom SoCsanon32020/12/15 07:07 PM
  In-band ECC support in recent Atom SoCsEtienne Lorrain2020/12/16 01:48 AM
    In-band ECC support in recent Atom SoCsAdrian2020/12/16 07:43 AM
      ECC in SoCsKonrad Schwarz2020/12/17 07:37 AM
        ECC in SoCsAdrian2020/12/17 08:43 AM
          ECC in SoCsMichael S2020/12/17 12:06 PM
  In-band ECC support in recent Atom SoCs & Tiger Lake UAdrian2020/12/16 07:31 AM
    In-band ECC support in recent Atom SoCs & Tiger Lake UJS2020/12/16 09:07 PM
      In-band ECC support in recent Atom SoCs & Tiger Lake UGabriele Svelto2020/12/16 10:00 PM
        In-band ECC support in recent Atom SoCs & Tiger Lake UJS2020/12/17 12:39 AM
          In-band ECC support in recent Atom SoCs & Tiger Lake UEtienne Lorrain2020/12/17 02:15 AM
            In-band ECC support in recent Atom SoCs & Tiger Lake UJames2020/12/17 07:28 AM
              In-band ECC support in recent Atom SoCs & Tiger Lake UEtienne Lorrain2020/12/17 09:16 AM
                In-band ECC support in recent Atom SoCs & Tiger Lake Urwessel2020/12/17 09:51 AM
                  In-band ECC support in recent Atom SoCs & Tiger Lake UMichael S2020/12/17 12:22 PM
    Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/16 10:09 PM
      Enough with the idiocy ... let's have proper ECC again.Maxwell2020/12/17 12:58 AM
        Enough with the idiocy ... let's have proper ECC again.pixiespeed2020/12/17 09:04 AM
      Enough with the idiocy ... let's have proper ECC again.Adrian2020/12/17 07:40 AM
      Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:09 AM
        Enough with the idiocy ... let's have proper ECC again.Etienne Lorrain2020/12/17 09:26 AM
          Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:42 AM
            Enough with the idiocy ... let's have proper ECC again.David Kanter2020/12/17 12:04 PM
              Enough with the idiocy ... let's have proper ECC again.Doug S2020/12/17 01:03 PM
              Enough with the idiocy ... let's have proper ECC again.phonon2020/12/17 03:25 PM
                Internal array ECC vs. memory controllerDavid Kanter2020/12/19 10:39 AM
                  Internal array ECC vs. memory controllerJörn Engel2020/12/20 10:42 AM
                    Internal array ECC vs. memory controllerrwessel2020/12/20 10:52 AM
                    Internal array ECC vs. memory controllerDavid Kanter2020/12/20 03:44 PM
              Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 04:55 PM
                Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/17 07:34 PM
                  Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:10 PM
                    Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/17 09:43 PM
                      Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/18 09:30 AM
                        Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 01:00 AM
                          Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/19 10:23 AM
                            Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 03:01 PM
                              Enough with the idiocy ... let's have proper ECC again.Maynard Handley2020/12/19 04:23 PM
                                Enough with the idiocy ... let's have proper ECC again.anon22020/12/19 04:30 PM
              Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/17 07:41 PM
                Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 08:48 PM
              Enough with the idiocy ... let's have proper ECC again.Memory Guy2020/12/17 09:19 PM
      Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/17 10:01 AM
      Enough with the idiocy ... let's have proper ECC again.Wes Felter2020/12/18 09:38 PM
        Thanks for the confirmation!David Kanter2020/12/19 11:51 AM
          Thanks for the confirmation!Konrad Schwarz2020/12/20 09:34 AM
            Thanks for the confirmation!Niels Jørgen Kruse2020/12/20 11:01 AM
              Thanks for the confirmation!David Kanter2020/12/20 03:45 PM
              Thanks for the confirmation!Gionatan Danti2020/12/21 12:50 AM
                Thanks for the confirmation!Niels Jørgen Kruse2020/12/21 09:07 AM
            Thanks for the confirmation!David Kanter2020/12/20 03:42 PM
              Thanks for the confirmation!Foo_2020/12/21 02:01 AM
                Thanks for the confirmation!David Kanter2020/12/21 08:39 AM
            Thanks for the confirmation!Paul2020/12/20 11:29 PM
              Thanks for the confirmation!Michael S2020/12/21 01:00 AM
                Thanks for the confirmation!anon20202020/12/21 01:44 AM
                Thanks for the confirmation!Paul2020/12/22 12:42 PM
                  Thanks for the confirmation!Michael S2020/12/22 02:28 PM
                    Thanks for the confirmation!Paul2020/12/22 06:12 PM
                      Thanks for the confirmation!Michael S2020/12/23 02:55 PM
                        Thanks for the confirmation!Paul2020/12/23 03:54 PM
                          Thanks for the confirmation!Dan Fay2020/12/23 04:38 PM
                            Thanks for the confirmation!Paul2020/12/26 04:10 AM
                              Thanks for the confirmation!Björn Ragnar Björnsson2020/12/26 08:37 PM
                                Thanks for the confirmation!anon22020/12/27 02:00 AM
                                Thanks for the confirmation!Doug S2020/12/28 12:47 PM
            Thanks for the confirmation!David Hess2020/12/21 06:35 PM
              Thanks for the confirmation!Konrad Schwarz2020/12/22 12:08 AM
                Thanks for the confirmation!Doug S2020/12/22 10:42 AM
                  Thanks for the confirmation!David Hess2020/12/22 12:32 PM
                Thanks for the confirmation!David Hess2020/12/22 12:21 PM
        Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 04:25 PM
          Enough with the idiocy ... let's have proper ECC again.Brett2020/12/19 08:13 PM
            Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 09:17 PM
              Enough with the idiocy ... let's have proper ECC again.Konrad Schwarz2020/12/21 03:29 AM
                Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/21 06:49 PM
            Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 09:57 PM
              Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 10:14 PM
            Enough with the idiocy ... let's have proper ECC again.Adrian2020/12/20 02:06 AM
              Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/20 08:43 AM
             Multi-level DRAM?Brett2020/12/20 08:07 PM
               Multi-level DRAM?Heikki Kultala2020/12/21 11:58 AM
               Multi-level DRAM?David Hess2020/12/21 07:25 PM
                 Multi-level DRAM?Adrian2020/12/22 05:15 AM
                   Multi-level DRAM?Dan Fay2020/12/22 10:11 AM
                     Multi-level DRAM?Paul2020/12/22 06:01 PM
                       Multi-level DRAM?Dan Fay2020/12/23 12:29 PM
                         Multi-level DRAM?Paul2020/12/23 01:00 PM
                           Multi-level DRAM?Dan Fay2020/12/23 04:30 PM
                             Multi-level DRAM?David Hess2020/12/23 05:05 PM
                           Multi-level DRAM?Björn Ragnar Björnsson2020/12/25 06:44 PM
                             Multi-level DRAM?Paul2020/12/26 04:04 AM
                               Multi-level DRAM?Michael S2020/12/26 08:11 AM
                                 DIMM binsPaul2020/12/26 08:55 AM
                                   DIMM binsBjörn Ragnar Björnsson2020/12/26 08:09 PM
                                     DIMM binsBjörn Ragnar Björnsson2020/12/26 08:19 PM
                                       DIMM binsDaniel Fay2020/12/27 07:51 PM
                                  Is binning at the module or die level? (NT)anonymous22020/12/27 02:36 PM
                                    Is binning at the module or die level?David Hess2020/12/28 01:31 PM
                               Multi-level DRAM?Doug S2020/12/28 12:55 PM
                               Multi-level DRAM?David Hess2020/12/28 01:36 PM
                             Multi-level DRAM?anon­­32020/12/26 10:22 PM
                               Multi-level DRAM?Björn Ragnar Björnsson2020/12/27 07:12 PM
                               Multi-level DRAM?Paul2021/01/04 04:20 AM
               Multi-level DRAM?Carson2021/01/05 12:14 PM
                 Multi-level DRAM?Brett2021/01/05 02:05 PM
        Enough with the idiocy ... let's have proper ECC again.Björn Ragnar Björnsson2020/12/19 04:35 PM
        Enough with the idiocy ... let's have proper ECC again.David Hess2020/12/19 08:59 PM
          Enough with the idiocy ... let's have proper ECC again.rwessel2020/12/20 08:56 AM
          Enough with the idiocy ... let's have proper ECC again.Doug S2020/12/20 10:16 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊