SMT influence on ROB size?

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), January 23, 2014 11:26 am
Room: Moderated Discussions
Patrick Chase (patickjchase.delete@this.gmail.com) on January 22, 2014 11:36 pm wrote:
[snip]
> One way to roughly assess whether this is the case is to look at ROB sizes (i.e. instruction
> window sizes). Branch prediction accuracy imposes an upper limit on the usable ROB size, because
> mispredicts cause ROB flushes (albeit partial flushes in SB/IB/Haswell). It would be pointless
> to design an ROB that's larger than the average number of instructions per mispredicted branch,
> and you'd probably want the latter to be quite a bit larger than the ROB size.
>
> If you look at recent Intel core designs, the ROB has doubled in size between Core2 and Haswell,
> so the branch mispredict rate should ideally have been reduced by a factor of 2 (or perhaps
> a bit less: as noted above they no longer completely flush the ROB on mispredict, so that
> would allow a somewhat higher mispredict frequency). It seems likely that Intel would still
> focus heavily on accuracy and/or mitigation strategies like dynamic predication.

Could some of this increase in ROB size be attributed to applying more resources to SMT and not just improved branch prediction? (Core2 was single-threaded, right?)

(If there were workloads with few unpredictable branches that could not get the same memory-level parallelism from prefetching [this excludes many workloads with few unpredictable branches], a larger ROB might make sense. Being able to dynamically adjust the ROB size would make such piecemeal optimization more attractive since they might only cost area not power for the majority of workloads, which would not benefit from a larger ROB. Adopting checkpointing [which might fit well with TM support] might also facilitate use of a modestly larger ROB.)

(It seems that branch prediction confidence estimation might be becoming more useful: dynamic predication, checkpoint selection, thread scheduling [including power optimization, a low confidence execution path having lower priority justifies less aggressive execution], and perhaps other aspects could use such confidence information.)

[snip]
> If you believe Agner Fog's results, Intel deployed a special-case loop predictor
> in P6 descendants from Pentium-M through Nehalem. Interestingly enough they appear
> to have discarded the loop predictor in Sandy/Ivy Bridge and Haswell.

Interesting. (By the way, I think "specialized transition rate predictor" might be a more accurate [but less specific and less concise] term than "loop predictor" since [I think] the predictor works for any branch with one direction having a 100% transition rate and the other direction having a fixed transition rate. E.g., I would not call a code sequence that updates an external counter once for every N events a loop, but [I think] the Intel "loop predictor" would predict such branches.)
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some cinebench scores and IPCTimothy McCaffrey2014/01/17 09:27 PM
  Many Thanks :) (NT)Alberto2014/01/18 02:12 AM
  Thanks! :-) (NT)Poindexter2014/01/19 04:46 AM
  Is Cinebench a totally useless benchmark?slacker2014/01/19 11:47 AM
    Is Cinebench a totally useless benchmark?Brett2014/01/19 12:48 PM
      Is Cinebench a totally useless benchmark?Alberto2014/01/19 02:08 PM
        Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:48 PM
          Is Cinebench a totally useless benchmark?Alberto2014/01/20 02:15 AM
            Is Cinebench a totally useless benchmark?Exophase2014/01/20 02:45 AM
              Is Cinebench a totally useless benchmark?Alberto2014/01/21 04:42 AM
                Is Cinebench a totally useless benchmark?Exophase2014/01/21 08:10 AM
                  Monopolies holding back advancementsDoug S2014/01/21 06:04 PM
                    Monopolies holding back advancementsMaxwell2014/01/22 08:00 AM
                      Monopolies holding back advancementsDoug S2014/01/22 11:31 PM
                        Moore's Law provided Planned Obsolescencehobold2014/01/23 01:31 AM
                          Moore's Law provided Planned ObsolescenceDoug S2014/01/23 08:54 PM
                            Moore's Law provided Planned Obsolescencehobold2014/01/24 03:02 AM
                              Moore's Law provided Planned ObsolescenceDoug S2014/01/24 01:18 PM
        Is Cinebench a totally useless benchmark?Maynard Handley2014/01/19 10:26 PM
          Is Cinebench a totally useless benchmark?Exophase2014/01/19 11:01 PM
            Is Cinebench a totally useless benchmark?Maynard Handley2014/01/20 03:25 AM
              Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 10:13 AM
                Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 12:31 PM
          Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 09:19 AM
            Intel and branch predictionDavid Kanter2014/01/21 10:26 AM
              Intel and branch predictionMaynard Handley2014/01/21 08:52 PM
                Intel and branch predictionMaynard Handley2014/01/21 09:14 PM
                No dynamic predication yet, I suspectPaul A. Clayton2014/01/21 10:04 PM
                  No dynamic predication yet, I suspectExophase2014/01/22 12:29 AM
                    No dynamic predication yet, I suspectdmcq2014/01/22 05:24 AM
                    No dynamic predication yet, I suspectPatrick Chase2014/01/22 11:36 PM
                      No dynamic predication yet, I suspectMaynard Handley2014/01/23 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 11:59 AM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:01 PM
                          16 misses per core on Haswell?David Kanter2014/01/23 06:10 PM
                            16 misses per core on Haswell?Patrick Chase2014/01/23 08:12 PM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/27 06:34 PM
                            Fixed link to paperPaul A. Clayton2014/01/28 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:29 PM
                      SMT influence on ROB size?Paul A. Clayton2014/01/23 11:26 AM
                        SMT influence on ROB size?Patrick Chase2014/01/23 08:40 PM
    Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:44 PM
    Is Cinebench a totally useless benchmark?anon2014/01/19 08:43 PM
      Is Cinebench a totally useless benchmark?Timothy McCaffrey2014/01/20 04:24 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?