Performance "speed limits"

By: Travis Downs (travis.downs.delete@this.gmail.com), June 13, 2019 1:05 pm
Room: Moderated Discussions
(patents discussed below, click back now if you can't be tainted)

anon (anon.delete@this.anon.com) on June 11, 2019 7:34 pm wrote:
> The in-flight branch limit is interesting. It is alleged to be used to roll back architectural
> state in the case of a mispredict, but of course other operations can cause flushes. Loads and
> stores of course, likely other rarer machine state changes as well (e.g., some flags bits).

Yes, I think those paths take a slower recovery path, the so-called "machine clear" in Intel speak. I have measured these to usually be 30+ cycles. There is an optimized path for bad branch recovery since those are so common, and can't be avoided (whereas, for example, memory misspeculation can be avoided by simply disabling speculation for that instruction, and Intel is quite aggressive in doing so, e.g., 1 misspeculation in 1024 loads is the threshold for disabling speculation).

>
> I wonder if Intel attaches some kind of flush / rollback data to every one of these kind of instructions
> that may cause a flush. Or of the data contained in branch buffer is specifically more detailed
> and allows faster or more precise recovery as they are the most important case.

One mechanism, which I think likely follows closely the actual implementation, is described in US6799268B1. Here's a key bit:

In the event that a branch misprediction occurs, the primary RAT state will be incorrect due to the incorrect speculation; however, the state indicated by the shadow RAT in this embodiment is not advanced beyond unresolved branches. Therefore, a mispredict may be unwound by advancing the shadow pointer to the branch, copying the shadow RAT 340 into the primary RAT 330 to restore the state of the registers, and then advancing the shadow pointer to the allocation pointer.


The BOB supports that by recording each branch and constantly keeping track of the oldest undecided branch (branch whose direction has not been decided), and maintaining a shadow RAT with the register state at that branch instruction. When a mispredict occurs, this shadow state can be used to do a quick recovery: it the correct architectural register values at the branch instruction so allocation and execution can restart there.

This would still result in a slow recovery in the scenario where you had a branch that was likely to mispredict but with a relatively short dep chain (let's say on a loop exit), and other branches within the loop that are slow to resolve (maybe they depend on loads from memory), but almost always predict correctly: when the loop exit branch mispredicts, quick recovery won't be possible in the above scheme, because you have to wait for the shadow RAT pointer (which points to the oldest undecided branch) to advance to the mispredicted branch - but to do that it has to get past all the slow branches.

Intel describes a way to get around that: multiple shadow pointers, all but one of which are allowed to advance past the last undecided branch: this would let you have a shadow RAT state which reflected the mispredicted branch and hence start recovering right away. I don't if Intel does this though, and how many shadow pointers are allowed. Perhaps a predictor is used to determine which branches are likely to benefit from shadow state. Some of this should be testable.

Look for "FIG. 8 illustrates one alternative embodiment" for a description of this mechanism.
< Previous Post in Thread 
TopicPosted ByDate
Performance "speed limits"Travis Downs2019/06/11 01:23 AM
  Performance "speed limits"Adrian2019/06/11 03:52 AM
    Performance "speed limits"Travis Downs2019/06/11 09:28 AM
  Performance "speed limits"Paul A. Clayton2019/06/11 05:04 AM
    correction of my corrections!Paul A. Clayton2019/06/11 05:07 AM
    Performance "speed limits"Peter E. Fry2019/06/11 07:19 AM
      Performance "speed limits"Travis Downs2019/06/11 09:36 AM
    Performance "speed limits"Travis Downs2019/06/11 09:26 AM
  Performance "speed limits"Branches2019/06/11 08:04 AM
  Performance "speed limits"anon2019/06/11 07:06 PM
    Performance "speed limits"Travis Downs2019/06/11 07:12 PM
      Thank you, very nice writeup (NT)anon2019/06/11 07:37 PM
  Performance "speed limits"anon2019/06/11 07:34 PM
    Performance "speed limits"Maynard Handley2019/06/12 10:13 PM
    Performance "speed limits"Travis Downs2019/06/13 01:05 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?