Sequential consistency in hardware

By: Travis Downs (, August 3, 2020 6:58 pm
Room: Moderated Discussions
never_released ( on August 3, 2020 8:44 am wrote:

> What are the advantages of having that guarantee provided by hardware more than just
> having TSO in practice? Are there cases where it's considered as more useful?

For sure, this guarantee is quite useful for actual high performance concurrent programming. There are all sorts of interesting things you can do to achieve level 0 implementations of various constructs when you have SC rather than TSO.

Basically, for many types of coordination between two threads you could use a Dekker-style synchronization, which uses only plain loads and stores, if only you had SC. TSO invariably breaks these, because the store-load reordering is important.

TSO still lets you do some interesting things without atomics, such as SPSC queues, double-checked locking, seqlocks and RCU, but there's a whole additional class of stuff you could do with full SC: userspace RCU w/o the existing hacks, biased locks, blazing fast rwlocks, etc.

Many of these things you can do under TSO, but only using asymmetric barriers, which can be prohibitively expensive if the "infrequent" side isn't extremely infrequent.

It's interesting to speculate what the cost is. The main implications for a "high perf" uarch (i.e., that still does all the access reorderings, but speculatively) seem to be:

1) Younger loads that were reordered around an older store need to be tracked so that at GO of the store, a nuke can be taken if the line for the load was lost in the meantime. This kind of tracking (e.g., "MOB" on Intel) is already done if you do load-load reordering (but disallow it in the memory model) so it doesn't actually seem that expensive: only the required condition for freeing loads from this structure changes: from "when all earlier loads are satisfied" to "when all earlier accesses are GO".

2) Store-to-load forwarding can still occur, but needs to be verified at retirement, necessarily incurring an RFO for the line, because "non-GO" forwarding can't be allowed. So a forwarding still needs to check cache and to start getting the line on a miss to make this verification (although this doesn't slow down the actual forwarding).

3) Senior (post-retirement) store buffer becomes infeasible: you can't defer GO for stores beyond retirement because you can't take a nuke any more for detected reorderings, and so due to (1) and (2) everything needs to be checked at retirement, at the latest. This would hurt some workloads with long-latency store misses that could otherwise be completely hidden by the senior store buffer: these will block retirement now and OOOE resources will eventually be consumed.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sequential consistency in hardwarenever_released2020/08/03 07:44 AM
  Sequential consistency in hardwareLinus Torvalds2020/08/03 09:19 AM
    Sequential consistency in hardwareJon Masters2020/08/03 04:22 PM
      Sequential consistency in hardwareGeert Bosch2020/08/03 07:48 PM
        Sequential consistency in hardwareTravis Downs2020/08/03 08:08 PM
          Sequential consistency in hardwareLinus Torvalds2020/08/03 10:20 PM
            Sequential consistency in hardwareLinus Torvalds2020/08/04 11:56 AM
              Sequential consistency in hardwarenever_released2020/08/04 02:03 PM
            Sequential consistency in hardwareVeedrac2020/08/05 11:54 AM
              Sequential consistency in hardwareDoug S2020/08/05 02:36 PM
                Sequential consistency in hardwareanon22020/08/05 03:06 PM
          Sequential consistency in hardwareAnon2020/08/04 07:02 AM
        Sequential consistency in hardwaredmcq2020/08/04 09:27 AM
          Sequential consistency in hardwareKonrad Schwarz2020/08/05 05:03 AM
  Sequential consistency in hardwareTravis Downs2020/08/03 06:58 PM
    Sequential consistency in hardwaregpd2020/08/04 02:19 AM
    Sequential consistency in hardwareJeff S.2020/08/04 10:11 PM
      Sequential consistency in hardwareTravis Downs2020/08/05 12:04 PM
        Sequential consistency in hardwareJeff S.2020/08/05 02:52 PM
          typoJeff S.2020/08/05 02:55 PM
          Sequential consistency in hardwareTravis Downs2020/08/05 06:39 PM
            Sequential consistency in hardwareJeff S.2020/08/05 07:43 PM
  Binary translationDavid Kanter2020/08/03 08:19 PM
Reply to this Topic
Body: No Text
How do you spell avocado?