Sequential consistency in hardware

By: Travis Downs (, August 5, 2020 12:04 pm
Room: Moderated Discussions
Jeff S. ( on August 4, 2020 11:11 pm wrote:
> Travis Downs ( on August 3, 2020 7:58 pm wrote:
> > It's interesting to speculate what the cost is. The main implications for a "high perf"
> > uarch (i.e., that still does all the access reorderings, but speculatively) seem to be:
> > ...
> > 2) Store-to-load forwarding can still occur, but needs to be verified at retirement,
> > necessarily incurring an RFO for the line, because "non-GO" forwarding can't be allowed.
> > So a forwarding still needs to check cache and to start getting the line on a miss to
> > make this verification (although this doesn't slow down the actual forwarding).
> When I talked with never_released about this recently, my gut reaction was that the straightforward
> approach of extending TSO-on-OoO would be conceptually very simple, just expensive in terms of
> eating up PRF/ROB/LQ entries waiting for invalidation-induced squashes even longer.

That sounds about right to me.

> I didn't consider the store-to-load forwarding case to be of particular note though, except maybe that the
> load's invalidation snooping would be inactive until after the preceding store committed to cache. By "verifying
> a forwarding at retirement", are you saying there needs to be some final or additional step beyond continued
> monitoring of invalidations, or are you just insinuating that invalidation monitoring would only reasonably
> be implemented with load queue entry flagging and (maximally) deferred failure handling?

Well my thought was that in the existing TSO model, a store forwarding is always allowed, so using a stored value as the source for the load isn't subject to any verification later. Of course, perhaps such loads are inserted anyway in the MOB (or whatever other structure) and subject to invalidation-based nuking, in which case it isn't a problem.

For something like memory renaming, it does mean that the load still needs to be tracked in the memory ordering structures, which seems unfortunate, because I think under x86-TSO this would not be required: the forwarding is always valid in the absence of any intervening memory barriers?

> Also, could you clarify what "incurring an RFO" means in this scenario exactly? In the
> case where the core already has the line as M/E before the store, I don't understand why
> the request would be needed, and in the case it's not, I don't follow why it would be
> more significant than any non-forwarded preceding store to another memory location.

No, you are right and this was wrong. The RFO is needed when the store retires, but this is needed anyway for plain stores under SC, so it's no different than (1) and the cost is mostly that store commit/GO is no longer decoupled from retirement, but this doesn't have anything to do with store forwarding.

After the store retires and until the load retires, you still need to track the load address for potential incoming invalidations, but this is no different than any other load (but still possibly worse than the x86-TSO scenario as above).

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sequential consistency in hardwarenever_released2020/08/03 07:44 AM
  Sequential consistency in hardwareLinus Torvalds2020/08/03 09:19 AM
    Sequential consistency in hardwareJon Masters2020/08/03 04:22 PM
      Sequential consistency in hardwareGeert Bosch2020/08/03 07:48 PM
        Sequential consistency in hardwareTravis Downs2020/08/03 08:08 PM
          Sequential consistency in hardwareLinus Torvalds2020/08/03 10:20 PM
            Sequential consistency in hardwareLinus Torvalds2020/08/04 11:56 AM
              Sequential consistency in hardwarenever_released2020/08/04 02:03 PM
            Sequential consistency in hardwareVeedrac2020/08/05 11:54 AM
              Sequential consistency in hardwareDoug S2020/08/05 02:36 PM
                Sequential consistency in hardwareanon22020/08/05 03:06 PM
          Sequential consistency in hardwareAnon2020/08/04 07:02 AM
        Sequential consistency in hardwaredmcq2020/08/04 09:27 AM
          Sequential consistency in hardwareKonrad Schwarz2020/08/05 05:03 AM
  Sequential consistency in hardwareTravis Downs2020/08/03 06:58 PM
    Sequential consistency in hardwaregpd2020/08/04 02:19 AM
    Sequential consistency in hardwareJeff S.2020/08/04 10:11 PM
      Sequential consistency in hardwareTravis Downs2020/08/05 12:04 PM
        Sequential consistency in hardwareJeff S.2020/08/05 02:52 PM
          typoJeff S.2020/08/05 02:55 PM
          Sequential consistency in hardwareTravis Downs2020/08/05 06:39 PM
            Sequential consistency in hardwareJeff S.2020/08/05 07:43 PM
  Binary translationDavid Kanter2020/08/03 08:19 PM
Reply to this Topic
Body: No Text
How do you spell avocado?