Sequential consistency in hardware

By: dmcq (dmcq.delete@this.fano.co.uk), August 4, 2020 9:27 am
Room: Moderated Discussions
Geert Bosch (boschg.delete@this.mac.com) on August 3, 2020 8:48 pm wrote:
> Jon Masters (jcm.delete@this.jonmasters.org) on August 3, 2020 5:22 pm wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 3, 2020 10:19 am wrote:
> >
> > > People used to believe that the fewer memory ordering guarantees you gave, the simpler you
> > > could make things, and the better everything would work. That turned out to not be true.
> >
> > Hahaha :) I was convinced for a while that when they said SC they must have meant TSO, but apparently
> > it really truly is SC. Which is...fascinating. But hey, it's 2020, so nothing is surprising any more.
> >
> > Jon.
> >
>
> The speculation needed for value prediction is similar to that required for sequential consistency (SC).
> Similarly to explicit static compiler-generated fences being bad for OOO execution, where using finer-grained
> dynamic knowledge about actual data dependencies allows for more scheduling freedom, can it be the case
> that the time has come where SC as a memory model becomes competitive in some situations?
>
> For historic context, didn't HP PA-RISC (RIP) support SC in larger multi-processor systems?
> How did that work for them? Obviously they jumped ship to the Itanic resulting in EPIC
> failure, but I doubt that was because of the superior memory model of the IA-64. Can we
> envision a high-performance many-core x86 processor with sequential consistency?
>
> It seems to fit some kind of pattern, for HPC and RISC/VLIW/GPU aficionados:
>

    >
  • We are about performance, so IEEE floats are bad. We underflow to zero, and that's a feature!


  • The original IBM/360 handled underflow right, I can't see whay x86 had to have such a dreadfully slow implementation.

    >
  • We are about performance, so unaligned accesses are bad.
    > We align our bytes to cache-lines, and that's a feature!


  • So now we have a hodge podge with for instance atomic and many SIMD instructions having to have aligned operands. Should never have been allowed to percolate into C.

    >
  • We are about performance, so cache coherency is bad. Our GPU's compute units are so fast they can't
    > be bothered to wait for any slow MOESI protocols: just run that crap on your slow CPU. That's a feature!


  • Well I agree with that at least but you'll find many here that don't.

    >
  • We are about performance, so small VM pages are bad. Our mallocs are measured in gigabytes and our
    > TLBs should fit our page maps, so please stop trying to memory map your tiny files. That's a feature!


  • Memory mapped files were an optimisation. I think he whole business should die die die now.

    >
  • We are about performance, so dynamic data dependency is bad. Our super-tight compute kernels must run
    > at full speed without nanny-checking store buffers. And our upcoming super-smart optimizing compiler will insert
    > just the right barriers for those who can't write fast code themselves: that will be a feature!


True. But I really wish proper memory barriers always had to be specified even for, well especially for, register dependencies in lock free code. There's too many people trying to write tricky code. The few that can do it properly should also know when to put in the appropriate types of fences. It should not be the case that the hardware has to always worry overly about infrequent code that should be written by experts. math.sin is far easier to get right.

> Over the long time, it appears that stronger models tend to win. Dynamic prediction and efficient
> handling of common cases outweighs the costs of dealing with infrequent outliers. As architectures
> grow up, the hardware starts supporting subnormal numbers, unaligned accesses, GPUs even acknowledge
> cache coherency is a thing, and Google responds to "vm page size" with "4096", and nobody has ever
> produced a compiler that can automatically insert barriers without destroying any hopes of good
> performance compared to half-way decent dynamic solutions in hardware, so...

x86 did. And IBM 370. Hopefully things can be fixed eventually.

> Maybe, just maybe, it may be the case that hardware is far better at checking actual behavior
> dynamically and that inserting some delays in hardware for the rare truly exceptional case (underflow,
> unaligned access crossing cache lines or a true inter-processor data dependency) is just a far
> cheaper thing than statically generating code that avoids these exceptions.
>
> Against common wisdom, software is static and hardware is dynamic. Most of the software that my 2019 x86 processor
> runs was written decades ago and compiled many years ago, targeting hardware that was obsolete at compile time.
> At any time, software is old and hardware is new. So maybe, just maybe, it makes sense for software to target
> a higher level machine and count on hardware to catch up rather than the other way around.
>
> It's just a thought.
>
> -Geert

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sequential consistency in hardwarenever_released2020/08/03 07:44 AM
  Sequential consistency in hardwareLinus Torvalds2020/08/03 09:19 AM
    Sequential consistency in hardwareJon Masters2020/08/03 04:22 PM
      Sequential consistency in hardwareGeert Bosch2020/08/03 07:48 PM
        Sequential consistency in hardwareTravis Downs2020/08/03 08:08 PM
          Sequential consistency in hardwareLinus Torvalds2020/08/03 10:20 PM
            Sequential consistency in hardwareLinus Torvalds2020/08/04 11:56 AM
              Sequential consistency in hardwarenever_released2020/08/04 02:03 PM
            Sequential consistency in hardwareVeedrac2020/08/05 11:54 AM
              Sequential consistency in hardwareDoug S2020/08/05 02:36 PM
                Sequential consistency in hardwareanon22020/08/05 03:06 PM
          Sequential consistency in hardwareAnon2020/08/04 07:02 AM
        Sequential consistency in hardwaredmcq2020/08/04 09:27 AM
          Sequential consistency in hardwareKonrad Schwarz2020/08/05 05:03 AM
  Sequential consistency in hardwareTravis Downs2020/08/03 06:58 PM
    Sequential consistency in hardwaregpd2020/08/04 02:19 AM
    Sequential consistency in hardwareJeff S.2020/08/04 10:11 PM
      Sequential consistency in hardwareTravis Downs2020/08/05 12:04 PM
        Sequential consistency in hardwareJeff S.2020/08/05 02:52 PM
          typoJeff S.2020/08/05 02:55 PM
          Sequential consistency in hardwareTravis Downs2020/08/05 06:39 PM
            Sequential consistency in hardwareJeff S.2020/08/05 07:43 PM
  Binary translationDavid Kanter2020/08/03 08:19 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?