SMT influence on ROB size?

By: Patrick Chase (patrickjchase.delete@this.gmail.com), January 23, 2014 8:40 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 23, 2014 11:26 am wrote:
> Patrick Chase (patickjchase.delete@this.gmail.com) on January 22, 2014 11:36 pm wrote:
> [snip]
> > One way to roughly assess whether this is the case is to look at ROB sizes (i.e. instruction
> > window sizes). Branch prediction accuracy imposes an upper limit on the usable ROB size, because
> > mispredicts cause ROB flushes (albeit partial flushes in SB/IB/Haswell). It would be pointless
> > to design an ROB that's larger than the average number of instructions per mispredicted branch,
> > and you'd probably want the latter to be quite a bit larger than the ROB size.
> >
> > If you look at recent Intel core designs, the ROB has doubled in size between Core2 and Haswell,
> > so the branch mispredict rate should ideally have been reduced by a factor of 2 (or perhaps
> > a bit less: as noted above they no longer completely flush the ROB on mispredict, so that
> > would allow a somewhat higher mispredict frequency). It seems likely that Intel would still
> > focus heavily on accuracy and/or mitigation strategies like dynamic predication.
>
> Could some of this increase in ROB size be attributed to applying more resources to
> SMT and not just improved branch prediction? (Core2 was single-threaded, right?)

Interesting point. I don't have any hard data, but my intuition is that there would be a significant positive impact from SMT. I think there would be two dynamics at play:

1. The actual rate of misprediction should be either unchanged or worse with SMT. "Worse" would arise in the case of contention for branch prediction entries.

2. Even in the absense of clever heuristics as you proposed in your post, N-thread SMT could reduce the average *impact* of a mispredict by a factor of up to N. This is true because it takes a constant number of clocks to resolve a branch, but N-thread SMT reduces per-thread issue rate by a factor of up to N (but only if issue bandwidth was maxed out to begin with. If not then the "improvement" isn't as great since SMT may actually increase the net issue rate).

So yeah, I think you've got a very good point (as usual). For any given acceptable fraction of wasted instructions you'd probably be able to afford more speculation with SMT. It probably isn't a 2X improvement, but it's probably nontrivial.

> [snip]
> > If you believe Agner Fog's results, Intel deployed a special-case loop predictor
> > in P6 descendants from Pentium-M through Nehalem. Interestingly enough they appear
> > to have discarded the loop predictor in Sandy/Ivy Bridge and Haswell.
>
> Interesting.

Yep, I thought so too when I finally read his guide front to back. The problem is of course that branch predictors are notoriously hard to reverse-engineer as David has pointed out. The only thing we can say for sure is that Dothan->Nehalem could perfectly predict repeated loops with fixed iteration counts <=64. A dedicated "loop" or "specialized transition rate" predictor is the most logical explanation. That behavior *isn't* consistent with a conventional pattern history mechanism as the period is far too high.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some cinebench scores and IPCTimothy McCaffrey2014/01/17 09:27 PM
  Many Thanks :) (NT)Alberto2014/01/18 02:12 AM
  Thanks! :-) (NT)Poindexter2014/01/19 04:46 AM
  Is Cinebench a totally useless benchmark?slacker2014/01/19 11:47 AM
    Is Cinebench a totally useless benchmark?Brett2014/01/19 12:48 PM
      Is Cinebench a totally useless benchmark?Alberto2014/01/19 02:08 PM
        Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:48 PM
          Is Cinebench a totally useless benchmark?Alberto2014/01/20 02:15 AM
            Is Cinebench a totally useless benchmark?Exophase2014/01/20 02:45 AM
              Is Cinebench a totally useless benchmark?Alberto2014/01/21 04:42 AM
                Is Cinebench a totally useless benchmark?Exophase2014/01/21 08:10 AM
                  Monopolies holding back advancementsDoug S2014/01/21 06:04 PM
                    Monopolies holding back advancementsMaxwell2014/01/22 08:00 AM
                      Monopolies holding back advancementsDoug S2014/01/22 11:31 PM
                        Moore's Law provided Planned Obsolescencehobold2014/01/23 01:31 AM
                          Moore's Law provided Planned ObsolescenceDoug S2014/01/23 08:54 PM
                            Moore's Law provided Planned Obsolescencehobold2014/01/24 03:02 AM
                              Moore's Law provided Planned ObsolescenceDoug S2014/01/24 01:18 PM
        Is Cinebench a totally useless benchmark?Maynard Handley2014/01/19 10:26 PM
          Is Cinebench a totally useless benchmark?Exophase2014/01/19 11:01 PM
            Is Cinebench a totally useless benchmark?Maynard Handley2014/01/20 03:25 AM
              Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 10:13 AM
                Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 12:31 PM
          Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 09:19 AM
            Intel and branch predictionDavid Kanter2014/01/21 10:26 AM
              Intel and branch predictionMaynard Handley2014/01/21 08:52 PM
                Intel and branch predictionMaynard Handley2014/01/21 09:14 PM
                No dynamic predication yet, I suspectPaul A. Clayton2014/01/21 10:04 PM
                  No dynamic predication yet, I suspectExophase2014/01/22 12:29 AM
                    No dynamic predication yet, I suspectdmcq2014/01/22 05:24 AM
                    No dynamic predication yet, I suspectPatrick Chase2014/01/22 11:36 PM
                      No dynamic predication yet, I suspectMaynard Handley2014/01/23 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 11:59 AM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:01 PM
                          16 misses per core on Haswell?David Kanter2014/01/23 06:10 PM
                            16 misses per core on Haswell?Patrick Chase2014/01/23 08:12 PM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/27 06:34 PM
                            Fixed link to paperPaul A. Clayton2014/01/28 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:29 PM
                      SMT influence on ROB size?Paul A. Clayton2014/01/23 11:26 AM
                        SMT influence on ROB size?Patrick Chase2014/01/23 08:40 PM
    Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:44 PM
    Is Cinebench a totally useless benchmark?anon2014/01/19 08:43 PM
      Is Cinebench a totally useless benchmark?Timothy McCaffrey2014/01/20 04:24 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?