No dynamic predication yet, I suspect

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), January 21, 2014 10:04 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on January 21, 2014 8:52 pm wrote:
[snip]
> Thanks for the info, David.
> Should we conclude, as a result, that Intel have already hit the upper limit in branch prediction:
> that they have pretty much everything covered (directional branches and indirect branches,
> specialized predictors for unusual situations, very long history correlations) and they're
> so close to the entropy limit that there's no scope for real improvement?
> (Except perhaps in really painful stuff, like merging after mispredicted branch divergence,
> which, yeah, is doable, but seems like an overall energy.performance loser.)

It seems they have not yet adopted dynamic predication for low-confidence hammock branches (IBM seems to be providing this only for single instruction hammock branches, but a more flexible mechanism might be useful), so it seems there is some room for improvement (if one considers dynamic predication part of branch prediction).

(I do not know how Intel's SMT manages resource allocation among threads, but obviously a thread following a low confidence path--especially indirect jumps--could be given a temporarily lower priority to improve throughput. This might already be done or not be worthwhile; it also does not apply to the more common case of underutilized cores/threads.)

Limited speculative multithreading might apply as part of branch prediction (e.g., predicting that a particular function call is a good candidate for speculative parallel execution).

I suspect there is some room for improvement in branch prediction but that other areas would provide more improvement for a given amount of effort. (Keeping a modest number of people working on branch prediction still makes sense even if only to develop and maintain expertise.)

> Should we likewise conclude the same thing regarding I- and D-prefetch, which
> would seem subject to the same sort of sociological issues as you describe?

Since instruction prefetch is closely tied to branch/path prediction, I suspect a lead in branch prediction technology would extend to instruction prefetch.

One advantage that Intel might have over most academic work is that optimizations can exploit synergy. An academic paper tends to look at one optimization in isolation where combining different optimizations may reduce overhead or increase the benefit. (Even with the theory of dark silicon, I suspect it is difficult to propose ideas that just waste area in the majority of workloads. However, if much of this overhead can be applied flexibly to other uses [which may also be minority uses], it may be easier to justify the expense.) Intel is also probably somewhat less interested in general results than academics, preferentially seeking results that apply to the specific implementation and workload targets. (Intel may also be better equipped to evaluate ideas in a realistic manner, both more accurately simulating real hardware and more accurately simulating real workloads. Combining research and development has benefits.)

(The above is highly speculative. I am an outsider to academia and industry.)
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some cinebench scores and IPCTimothy McCaffrey2014/01/17 09:27 PM
  Many Thanks :) (NT)Alberto2014/01/18 02:12 AM
  Thanks! :-) (NT)Poindexter2014/01/19 04:46 AM
  Is Cinebench a totally useless benchmark?slacker2014/01/19 11:47 AM
    Is Cinebench a totally useless benchmark?Brett2014/01/19 12:48 PM
      Is Cinebench a totally useless benchmark?Alberto2014/01/19 02:08 PM
        Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:48 PM
          Is Cinebench a totally useless benchmark?Alberto2014/01/20 02:15 AM
            Is Cinebench a totally useless benchmark?Exophase2014/01/20 02:45 AM
              Is Cinebench a totally useless benchmark?Alberto2014/01/21 04:42 AM
                Is Cinebench a totally useless benchmark?Exophase2014/01/21 08:10 AM
                  Monopolies holding back advancementsDoug S2014/01/21 06:04 PM
                    Monopolies holding back advancementsMaxwell2014/01/22 08:00 AM
                      Monopolies holding back advancementsDoug S2014/01/22 11:31 PM
                        Moore's Law provided Planned Obsolescencehobold2014/01/23 01:31 AM
                          Moore's Law provided Planned ObsolescenceDoug S2014/01/23 08:54 PM
                            Moore's Law provided Planned Obsolescencehobold2014/01/24 03:02 AM
                              Moore's Law provided Planned ObsolescenceDoug S2014/01/24 01:18 PM
        Is Cinebench a totally useless benchmark?Maynard Handley2014/01/19 10:26 PM
          Is Cinebench a totally useless benchmark?Exophase2014/01/19 11:01 PM
            Is Cinebench a totally useless benchmark?Maynard Handley2014/01/20 03:25 AM
              Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 10:13 AM
                Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 12:31 PM
          Is Cinebench a totally useless benchmark?Patrick Chase2014/01/21 09:19 AM
            Intel and branch predictionDavid Kanter2014/01/21 10:26 AM
              Intel and branch predictionMaynard Handley2014/01/21 08:52 PM
                Intel and branch predictionMaynard Handley2014/01/21 09:14 PM
                No dynamic predication yet, I suspectPaul A. Clayton2014/01/21 10:04 PM
                  No dynamic predication yet, I suspectExophase2014/01/22 12:29 AM
                    No dynamic predication yet, I suspectdmcq2014/01/22 05:24 AM
                    No dynamic predication yet, I suspectPatrick Chase2014/01/22 11:36 PM
                      No dynamic predication yet, I suspectMaynard Handley2014/01/23 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 11:59 AM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:01 PM
                          16 misses per core on Haswell?David Kanter2014/01/23 06:10 PM
                            16 misses per core on Haswell?Patrick Chase2014/01/23 08:12 PM
                          No dynamic predication yet, I suspectPatrick Chase2014/01/27 06:34 PM
                            Fixed link to paperPaul A. Clayton2014/01/28 08:51 AM
                        No dynamic predication yet, I suspectPatrick Chase2014/01/23 12:29 PM
                      SMT influence on ROB size?Paul A. Clayton2014/01/23 11:26 AM
                        SMT influence on ROB size?Patrick Chase2014/01/23 08:40 PM
    Is Cinebench a totally useless benchmark?Exophase2014/01/19 07:44 PM
    Is Cinebench a totally useless benchmark?anon2014/01/19 08:43 PM
      Is Cinebench a totally useless benchmark?Timothy McCaffrey2014/01/20 04:24 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊