Sequential consistency in hardware

By: Geert Bosch (boschg.delete@this.mac.com), August 3, 2020 7:48 pm
Room: Moderated Discussions
Jon Masters (jcm.delete@this.jonmasters.org) on August 3, 2020 5:22 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 3, 2020 10:19 am wrote:
>
> > People used to believe that the fewer memory ordering guarantees you gave, the simpler you
> > could make things, and the better everything would work. That turned out to not be true.
>
> Hahaha :) I was convinced for a while that when they said SC they must have meant TSO, but apparently
> it really truly is SC. Which is...fascinating. But hey, it's 2020, so nothing is surprising any more.
>
> Jon.
>

The speculation needed for value prediction is similar to that required for sequential consistency (SC). Similarly to explicit static compiler-generated fences being bad for OOO execution, where using finer-grained dynamic knowledge about actual data dependencies allows for more scheduling freedom, can it be the case that the time has come where SC as a memory model becomes competitive in some situations?

For historic context, didn't HP PA-RISC (RIP) support SC in larger multi-processor systems? How did that work for them? Obviously they jumped ship to the Itanic resulting in EPIC failure, but I doubt that was because of the superior memory model of the IA-64. Can we envision a high-performance many-core x86 processor with sequential consistency?

It seems to fit some kind of pattern, for HPC and RISC/VLIW/GPU aficionados:

  • We are about performance, so IEEE floats are bad. We underflow to zero, and that's a feature!

  • We are about performance, so unaligned accesses are bad. We align our bytes to cache-lines, and that's a feature!

  • We are about performance, so cache coherency is bad. Our GPU's compute units are so fast they can't be bothered to wait for any slow MOESI protocols: just run that crap on your slow CPU. That's a feature!

  • We are about performance, so small VM pages are bad. Our mallocs are measured in gigabytes and our TLBs should fit our page maps, so please stop trying to memory map your tiny files. That's a feature!

  • We are about performance, so dynamic data dependency is bad. Our super-tight compute kernels must run at full speed without nanny-checking store buffers. And our upcoming super-smart optimizing compiler will insert just the right barriers for those who can't write fast code themselves: that will be a feature!

Over the long time, it appears that stronger models tend to win. Dynamic prediction and efficient handling of common cases outweighs the costs of dealing with infrequent outliers. As architectures grow up, the hardware starts supporting subnormal numbers, unaligned accesses, GPUs even acknowledge cache coherency is a thing, and Google responds to "vm page size" with "4096", and nobody has ever produced a compiler that can automatically insert barriers without destroying any hopes of good performance compared to half-way decent dynamic solutions in hardware, so...

Maybe, just maybe, it may be the case that hardware is far better at checking actual behavior dynamically and that inserting some delays in hardware for the rare truly exceptional case (underflow, unaligned access crossing cache lines or a true inter-processor data dependency) is just a far cheaper thing than statically generating code that avoids these exceptions.

Against common wisdom, software is static and hardware is dynamic. Most of the software that my 2019 x86 processor runs was written decades ago and compiled many years ago, targeting hardware that was obsolete at compile time. At any time, software is old and hardware is new. So maybe, just maybe, it makes sense for software to target a higher level machine and count on hardware to catch up rather than the other way around.

It's just a thought.

-Geert
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sequential consistency in hardwarenever_released2020/08/03 07:44 AM
  Sequential consistency in hardwareLinus Torvalds2020/08/03 09:19 AM
    Sequential consistency in hardwareJon Masters2020/08/03 04:22 PM
      Sequential consistency in hardwareGeert Bosch2020/08/03 07:48 PM
        Sequential consistency in hardwareTravis Downs2020/08/03 08:08 PM
          Sequential consistency in hardwareLinus Torvalds2020/08/03 10:20 PM
            Sequential consistency in hardwareLinus Torvalds2020/08/04 11:56 AM
              Sequential consistency in hardwarenever_released2020/08/04 02:03 PM
            Sequential consistency in hardwareVeedrac2020/08/05 11:54 AM
              Sequential consistency in hardwareDoug S2020/08/05 02:36 PM
                Sequential consistency in hardwareanon22020/08/05 03:06 PM
          Sequential consistency in hardwareAnon2020/08/04 07:02 AM
        Sequential consistency in hardwaredmcq2020/08/04 09:27 AM
          Sequential consistency in hardwareKonrad Schwarz2020/08/05 05:03 AM
  Sequential consistency in hardwareTravis Downs2020/08/03 06:58 PM
    Sequential consistency in hardwaregpd2020/08/04 02:19 AM
    Sequential consistency in hardwareJeff S.2020/08/04 10:11 PM
      Sequential consistency in hardwareTravis Downs2020/08/05 12:04 PM
        Sequential consistency in hardwareJeff S.2020/08/05 02:52 PM
          typoJeff S.2020/08/05 02:55 PM
          Sequential consistency in hardwareTravis Downs2020/08/05 06:39 PM
            Sequential consistency in hardwareJeff S.2020/08/05 07:43 PM
  Binary translationDavid Kanter2020/08/03 08:19 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?