Sequential consistency in hardware

By: Jeff S. (, August 4, 2020 10:11 pm
Room: Moderated Discussions
Travis Downs ( on August 3, 2020 7:58 pm wrote:
> It's interesting to speculate what the cost is. The main implications for a "high perf"
> uarch (i.e., that still does all the access reorderings, but speculatively) seem to be:
> ...
> 2) Store-to-load forwarding can still occur, but needs to be verified at retirement,
> necessarily incurring an RFO for the line, because "non-GO" forwarding can't be allowed.
> So a forwarding still needs to check cache and to start getting the line on a miss to
> make this verification (although this doesn't slow down the actual forwarding).

When I talked with never_released about this recently, my gut reaction was that the straightforward approach of extending TSO-on-OoO would be conceptually very simple, just expensive in terms of eating up PRF/ROB/LQ entries waiting for invalidation-induced squashes even longer.

I didn't consider the store-to-load forwarding case to be of particular note though, except maybe that the load's invalidation snooping would be inactive until after the preceding store committed to cache. By "verifying a forwarding at retirement", are you saying there needs to be some final or additional step beyond continued monitoring of invalidations, or are you just insinuating that invalidation monitoring would only reasonably be implemented with load queue entry flagging and (maximally) deferred failure handling?

Also, could you clarify what "incurring an RFO" means in this scenario exactly? In the case where the core already has the line as M/E before the store, I don't understand why the request would be needed, and in the case it's not, I don't follow why it would be more significant than any non-forwarded preceding store to another memory location.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sequential consistency in hardwarenever_released2020/08/03 07:44 AM
  Sequential consistency in hardwareLinus Torvalds2020/08/03 09:19 AM
    Sequential consistency in hardwareJon Masters2020/08/03 04:22 PM
      Sequential consistency in hardwareGeert Bosch2020/08/03 07:48 PM
        Sequential consistency in hardwareTravis Downs2020/08/03 08:08 PM
          Sequential consistency in hardwareLinus Torvalds2020/08/03 10:20 PM
            Sequential consistency in hardwareLinus Torvalds2020/08/04 11:56 AM
              Sequential consistency in hardwarenever_released2020/08/04 02:03 PM
            Sequential consistency in hardwareVeedrac2020/08/05 11:54 AM
              Sequential consistency in hardwareDoug S2020/08/05 02:36 PM
                Sequential consistency in hardwareanon22020/08/05 03:06 PM
          Sequential consistency in hardwareAnon2020/08/04 07:02 AM
        Sequential consistency in hardwaredmcq2020/08/04 09:27 AM
          Sequential consistency in hardwareKonrad Schwarz2020/08/05 05:03 AM
  Sequential consistency in hardwareTravis Downs2020/08/03 06:58 PM
    Sequential consistency in hardwaregpd2020/08/04 02:19 AM
    Sequential consistency in hardwareJeff S.2020/08/04 10:11 PM
      Sequential consistency in hardwareTravis Downs2020/08/05 12:04 PM
        Sequential consistency in hardwareJeff S.2020/08/05 02:52 PM
          typoJeff S.2020/08/05 02:55 PM
          Sequential consistency in hardwareTravis Downs2020/08/05 06:39 PM
            Sequential consistency in hardwareJeff S.2020/08/05 07:43 PM
  Binary translationDavid Kanter2020/08/03 08:19 PM
Reply to this Topic
Body: No Text
How do you spell avocado?