No dynamic predication yet, I suspect

By: Patrick Chase (patrickjchase.delete@this.gmail.com), January 23, 2014 11:59 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on January 23, 2014 8:51 am wrote:
> Patrick Chase (patickjchase.delete@this.gmail.com) on January 22, 2014 11:36 pm wrote:
> > One way to roughly assess whether this is the case is to look at ROB sizes (i.e.
> > instruction window sizes). Branch prediction accuracy imposes an upper limit on the
> > usable ROB size, because mispredicts cause ROB flushes (albeit partial flushes in
> > SB/IB/Haswell). It would be pointless to design an ROB that's larger than the average
> > number of instructions per mispredicted branch, and you'd probably want the latter to
> > be quite a bit larger than the ROB size.
>
> This assumes that what's governing the ROB size is branch prediction.

Yes, obviously there are multiple factors. Issue widths, pipeline depths, and desired degree of latency-hiding for things like cache misses tend to drive the ROB size up. The likelihood of mis-speculation (leading to re-execution and wasted energy) is one of the factors that drives it down.

> I don't believe that is the case. As far as I can tell - with current predictors,
> there's value in a ROB of at least five or six hundred instructions, perhaps a
> thousand. (That is, with misprediction accuracy as it is, you can go about that far
> and still have a reasonable chance of executing instructions that won't later be
> voided.)

This is highly workload-dependent, but assuming SPECint-ish behavior (20% branches) and a 1-2% mispredict rate you end up with ~250-500 instructions per mispredict. IMO that's actually perilously close to the ROB size of, say, Haswell (192).

> - the actual gating factor is the size of the instruction window, and the power it
> takes to search through it to find instructions that meet issue criteria.

You appear to be confusing the scheduler window (the number of instructions that are candidates to be executed by the functional units) with the speculative window (the number of instructions that have been issued but not retired, and that may be quashed due to mispredicts/exceptions/etc). The ROB size corresponds to the latter, and it does not need to be searched in the manner you describe. It's accessed either in-order (for issue and retirement) or by indices that are known a priori (for mispredict handling etc).

> The rule of thumb seems to be that the
> ROB should be about 3x the size of this window,

They're not correlated by a simple ratio. The ROB should be at least as large as the scheduler window[s] plus the number of common instructions that can be "in flight" in the functional units. Beyond that it's common to have some "cushion" depending upon what latencies you want to hide. For example, you might want to have at least (l2_latency*issue_width) additional instructions to hide one or more L1 misses while still maintaining a full scheduler. A few examples:

Silvermont: 2-wide, 2x8 entry schedulers for each of (int/fp), 32-entry ROB (~2* scheduler/RS), 13-cycle L2. ROB is 2X scheduler size. This design has barely enough ROB entries to "cover" the scheduler/FU latencies. It can't cover an L2 miss while simultaneously keeping the schedulers full.

PPro: 3-wide, 20-entry RS, 40-entry ROB (2*scheduler/RS), 5-clock L2 latency. This design can "cover" an L2 miss while maintaining a full scheduler (assuming all simple instructions).

MIPS R10k: 4-wide, 3*16-entry schedulers, 32-entry ROB (~1.5*scheduler depending on mix). Can barely keep the schedulers full, almost guaranteed to stall on L1 miss.

Haswell: 4-wide, 60-entry scheduler, 192-entry ROB (3.3*scheduler), 11 cycle L2 latency. This design can reorder past multiple L1 misses without stalling (note that Haswell can sustain 16 concurrent L1 misses per core - That feature wouldn't be useful without so much ROB "cushion").

P4: 3-wide, 8+~2x20-entry schedulers, 126-entry ROB (~5X scheduler depending on mix), ~5 cycle L2 latency. Once again this design is intended to reorder past multiple L1 misses, and once again that's reflected in an L2 design that can service a large number of concurrent requests.

> and the window is made as big as power/clock will
> allow. That's why the starting point of schemes like KIP or CFP is always to shunt
> "waiting" instructions out of the window to some holding silo while they wait for their
> load to be delivered.

True, but addressing a different problem/topic.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some cinebench scores and IPCTimothy McCaffrey2014/01/17 09:27 PM
  Many Thanks :) (NT)Alberto2014/01/18 02:12 AM
  Thanks! :-) (NT)Poindexter2014/01/19 04:46 AM
  Is Cinebench a totally useless benchmark?slacker2014/01/19 11:47 AM
    Is Cinebench a totally useless benchmark?Brett2014/01/19 12:48 PM
      Is Cinebench a totally useless benchmark?Alberto2014/01/19 02:08 PM
        Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:48 PM
          Is Cinebench a totally useless benchmark?Alberto2014/01/20 02:15 AM
            Is Cinebench a totally useless benchmark?Exophase2014/01/20 02:45 AM
              Is Cinebench a totally useless benchmark?Alberto2014/01/21 04:42 AM
                Is Cinebench a totally useless benchmark?Exophase2014/01/21 08:10 AM
                  Monopolies holding back advancementsDoug S2014/01/21 06:04 PM
                    Monopolies holding back advancementsMaxwell2014/01/22 08:00 AM
                      Monopolies holding back advancementsDoug S2014/01/22 11:31 PM
                        Moore's Law provided Planned Obsolescencehobold2014/01/23 01:31 AM
                          Moore's Law provided Planned ObsolescenceDoug S2014/01/23 08:54 PM
                            Moore's Law provided Planned Obsolescencehobold2014/01/24 03:02 AM
                              Moore's Law provided Planned ObsolescenceDoug S2014/01/24 01:18 PM
        Is Cinebench a totally useless benchmark?Maynard Handley2014/01/19 10:26 PM
          Is Cinebench a totally useless benchmark?Exophase2014/01/19 11:01 PM
            Is Cinebench a totally useless benchmark?Maynard Handley2014/01/20 03:25 AM
              Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 10:13 AM
                Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 12:31 PM
          Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 09:19 AM
            Intel and branch predictionDavid Kanter2014/01/21 10:26 AM
              Intel and branch predictionMaynard Handley2014/01/21 08:52 PM
                Intel and branch predictionMaynard Handley2014/01/21 09:14 PM
                No dynamic predication yet, I suspectPaul A. Clayton2014/01/21 10:04 PM
                  No dynamic predication yet, I suspectExophase2014/01/22 12:29 AM
                    No dynamic predication yet, I suspectdmcq2014/01/22 05:24 AM
                    No dynamic predication yet, I suspectPatrick Chase2014/01/22 11:36 PM
                      No dynamic predication yet, I suspectMaynard Handley2014/01/23 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 11:59 AM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:01 PM
                          16 misses per core on Haswell?David Kanter2014/01/23 06:10 PM
                            16 misses per core on Haswell?Patrick Chase2014/01/23 08:12 PM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/27 06:34 PM
                            Fixed link to paperPaul A. Clayton2014/01/28 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:29 PM
                      SMT influence on ROB size?Paul A. Clayton2014/01/23 11:26 AM
                        SMT influence on ROB size?Patrick Chase2014/01/23 08:40 PM
    Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:44 PM
    Is Cinebench a totally useless benchmark?anon2014/01/19 08:43 PM
      Is Cinebench a totally useless benchmark?Timothy McCaffrey2014/01/20 04:24 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?