speculation *beyond* retirement

By: Travis Downs (travis.downs.delete@this.gmail.com), September 14, 2020 7:02 pm
Room: Moderated Discussions
Warning: some information derived from patents discussed ahead: this is your notice move along if you don't want to be tainted.

The usual view of speculative execution is that it ends at retirement. Indeed, in-order retirement is the point at which the instruction definitely takes effect1, almost by definition. It seems like on AMD Zen that's no longer true! Instructions can be replayed even after they've retired.

The idea, as far as I can tell, is to let atomic instructions (LOCK-prefixed) retire even before their associated store becomes GO (globally observable: approximately meaning "before it commits to L1")2. When the store is ready to become GO (i.e., it is at the head of the store queue) a check is made to see if atomicity was preserved: i.e., whether the cache line was held exclusively from the moment of the load associated with the atomic until now. If so, commit the store. If not, you need to roll back based on a check-pointing system and re-execute and re-retire some instructions.

This is quite different from the normal rollback associated e.g., with mispredicted branches, mispredicted memory speculation, faults, and so on. In that case the cat isn't out of the bag: the instructions haven't retired so it's "just" a ROB flush and start over.

Pretty interesting and makes "true retirement" a bit of a fuzzy concept.

Apparently discovered by the rustc folks when they were trying to understand why the "instructions retired" counter could overcount sometimes (answer: some instructions really are retired multiple times due to this effect), as discussed here and here.

The primary benefit would seem to be that uncontended atomics don't need to block retirement until the atomic commits (this is slow because the store buffer must be drained while waiting). Intel also has an optimization in this area, I think, although it works differently: the load and op part of the atomic can execute ahead of and decoupled from the store (in the usual OOOE way), and then when the store executes it verifies the line wasn't lost since the load part. However, the store is still at-retirement. The primary benefit of the AMD approach would seem to be when store misses are involved: either for the atomically accessed location or other earlier stores. In this case the post-retire store buffer still operates normally and can hide much or all of the store miss latency.

AMD seems to call this feature "SpecLockMap" based on the MSR naming discussed in the above links.





1 Store instructions are a bit special as described in the next footnote: they will definitely take effect with the stored value once they've retired, but their visibility may be deferred due to senior store buffer.

2 This is of course par for the course for normal stores: this is the so-called "senior store buffer" which holds stores which have retired but not yet committed to cache. Normal stores, however, can't be "revoked" at this point: they are guaranteed to commit to cache at some point, so the issue discussed here doesn't arise.
 Next Post in Thread >
TopicPosted ByDate
speculation *beyond* retirementTravis Downs2020/09/14 07:02 PM
  speculation *beyond* retirementJeff S.2020/09/14 10:29 PM
    speculation *beyond* retirementTravis Downs2020/09/15 09:25 AM
      speculation *beyond* retirementJeff S.2020/09/15 11:23 AM
        speculation *beyond* retirementTravis Downs2020/09/15 12:58 PM
          speculation *beyond* retirementJeff S.2020/09/15 01:38 PM
            speculation *beyond* retirementTravis Downs2020/09/15 02:09 PM
              speculation *beyond* retirementJeff S.2020/09/15 02:42 PM
            test resultsTravis Downs2020/09/15 05:39 PM
              test results: ZenTravis Downs2020/09/15 06:09 PM
                test results: ZenJeff S.2020/09/15 08:51 PM
                  test results: ZenTravis Downs2020/09/16 09:40 PM
                    test results: ZenJeff S.2020/09/19 07:23 PM
                      test results: ZenTravis Downs2020/09/21 04:31 PM
                        test results: ZenJeff S.2020/09/21 04:49 PM
                          test results: ZenTravis Downs2020/09/21 06:42 PM
                            test results: ZenJeff S.2020/09/21 09:08 PM
                              test results: ZenTravis Downs2020/09/22 07:55 AM
                                test results: ZenJeff S.2020/09/22 11:01 AM
                                  test results: ZenTravis Downs2020/09/23 08:31 AM
                test results: SNBTravis Downs2020/09/17 01:40 PM
                  test results: ICLTravis Downs2020/09/21 04:00 PM
          speculation *beyond* retirementLinus Torvalds2020/09/16 12:48 PM
            speculation *beyond* retirementMichael S2020/09/16 03:17 PM
              speculation *beyond* retirementLinus Torvalds2020/09/16 04:04 PM
                speculation *beyond* retirementAnon2020/09/16 04:15 PM
                  speculation *beyond* retirementMichael S2020/09/16 04:42 PM
                PIO still lurksJeff S.2020/09/16 04:57 PM
                  PIO still lurksTravis Downs2020/09/21 06:53 PM
                speculation *beyond* retirementTravis Downs2020/09/16 10:57 PM
                  speculation *beyond* retirementLinus Torvalds2020/09/17 11:44 AM
                    speculation *beyond* retirementTravis Downs2020/09/17 12:04 PM
                      speculation *beyond* retirementLinus Torvalds2020/09/17 12:32 PM
                        speculation *beyond* retirementTravis Downs2020/09/17 01:37 PM
                          "is appleaing" not "is appearing" (NT)Travis Downs2020/09/17 01:37 PM
                            appealing ... argh (NT)Travis Downs2020/09/17 01:37 PM
                          speculation *beyond* retirementanon22020/09/18 01:14 AM
                            speculation *beyond* retirementTravis Downs2020/09/18 07:17 AM
              mfence LOCKs and SKL155Travis Downs2020/09/16 10:10 PM
            secret history of x86 fences?Jeff S.2020/09/16 04:33 PM
              secret history of x86 fences?Travis Downs2020/09/17 12:12 PM
                secret history of x86 fences?Linus Torvalds2020/09/17 12:41 PM
                  secret history of x86 fences?Travis Downs2020/09/17 01:31 PM
                    secret history of x86 fences?Linus Torvalds2020/09/17 03:34 PM
  speculation *beyond* retirementanonymou52020/09/15 02:05 AM
    speculation *beyond* retirementTravis Downs2020/09/15 08:24 AM
  So AMD messed with namesAnon2020/09/15 04:47 AM
    So AMD messed with namesJeff S.2020/09/15 05:29 AM
      So AMD messed with namesTravis Downs2020/09/15 09:36 AM
    So AMD messed with namesTravis Downs2020/09/15 08:23 AM
      So AMD messed with namesAnon2020/09/15 08:43 AM
        So AMD messed with namesTravis Downs2020/09/15 09:27 AM
          So AMD messed with namesAnon2020/09/15 09:40 AM
            So AMD messed with namesTravis Downs2020/09/15 09:46 AM
  speculation *beyond* retirement / rustc rdpmc write-upeddyb2020/11/03 11:19 AM
    p->offset and p->index handlingTravis Downs2020/11/03 04:37 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?