By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), January 21, 2014 10:04 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on January 21, 2014 8:52 pm wrote:
[snip]
> Thanks for the info, David.
> Should we conclude, as a result, that Intel have already hit the upper limit in branch prediction:
> that they have pretty much everything covered (directional branches and indirect branches,
> specialized predictors for unusual situations, very long history correlations) and they're
> so close to the entropy limit that there's no scope for real improvement?
> (Except perhaps in really painful stuff, like merging after mispredicted branch divergence,
> which, yeah, is doable, but seems like an overall energy.performance loser.)
It seems they have not yet adopted dynamic predication for low-confidence hammock branches (IBM seems to be providing this only for single instruction hammock branches, but a more flexible mechanism might be useful), so it seems there is some room for improvement (if one considers dynamic predication part of branch prediction).
(I do not know how Intel's SMT manages resource allocation among threads, but obviously a thread following a low confidence path--especially indirect jumps--could be given a temporarily lower priority to improve throughput. This might already be done or not be worthwhile; it also does not apply to the more common case of underutilized cores/threads.)
Limited speculative multithreading might apply as part of branch prediction (e.g., predicting that a particular function call is a good candidate for speculative parallel execution).
I suspect there is some room for improvement in branch prediction but that other areas would provide more improvement for a given amount of effort. (Keeping a modest number of people working on branch prediction still makes sense even if only to develop and maintain expertise.)
> Should we likewise conclude the same thing regarding I- and D-prefetch, which
> would seem subject to the same sort of sociological issues as you describe?
Since instruction prefetch is closely tied to branch/path prediction, I suspect a lead in branch prediction technology would extend to instruction prefetch.
One advantage that Intel might have over most academic work is that optimizations can exploit synergy. An academic paper tends to look at one optimization in isolation where combining different optimizations may reduce overhead or increase the benefit. (Even with the theory of dark silicon, I suspect it is difficult to propose ideas that just waste area in the majority of workloads. However, if much of this overhead can be applied flexibly to other uses [which may also be minority uses], it may be easier to justify the expense.) Intel is also probably somewhat less interested in general results than academics, preferentially seeking results that apply to the specific implementation and workload targets. (Intel may also be better equipped to evaluate ideas in a realistic manner, both more accurately simulating real hardware and more accurately simulating real workloads. Combining research and development has benefits.)
(The above is highly speculative. I am an outsider to academia and industry.)
[snip]
> Thanks for the info, David.
> Should we conclude, as a result, that Intel have already hit the upper limit in branch prediction:
> that they have pretty much everything covered (directional branches and indirect branches,
> specialized predictors for unusual situations, very long history correlations) and they're
> so close to the entropy limit that there's no scope for real improvement?
> (Except perhaps in really painful stuff, like merging after mispredicted branch divergence,
> which, yeah, is doable, but seems like an overall energy.performance loser.)
It seems they have not yet adopted dynamic predication for low-confidence hammock branches (IBM seems to be providing this only for single instruction hammock branches, but a more flexible mechanism might be useful), so it seems there is some room for improvement (if one considers dynamic predication part of branch prediction).
(I do not know how Intel's SMT manages resource allocation among threads, but obviously a thread following a low confidence path--especially indirect jumps--could be given a temporarily lower priority to improve throughput. This might already be done or not be worthwhile; it also does not apply to the more common case of underutilized cores/threads.)
Limited speculative multithreading might apply as part of branch prediction (e.g., predicting that a particular function call is a good candidate for speculative parallel execution).
I suspect there is some room for improvement in branch prediction but that other areas would provide more improvement for a given amount of effort. (Keeping a modest number of people working on branch prediction still makes sense even if only to develop and maintain expertise.)
> Should we likewise conclude the same thing regarding I- and D-prefetch, which
> would seem subject to the same sort of sociological issues as you describe?
Since instruction prefetch is closely tied to branch/path prediction, I suspect a lead in branch prediction technology would extend to instruction prefetch.
One advantage that Intel might have over most academic work is that optimizations can exploit synergy. An academic paper tends to look at one optimization in isolation where combining different optimizations may reduce overhead or increase the benefit. (Even with the theory of dark silicon, I suspect it is difficult to propose ideas that just waste area in the majority of workloads. However, if much of this overhead can be applied flexibly to other uses [which may also be minority uses], it may be easier to justify the expense.) Intel is also probably somewhat less interested in general results than academics, preferentially seeking results that apply to the specific implementation and workload targets. (Intel may also be better equipped to evaluate ideas in a realistic manner, both more accurately simulating real hardware and more accurately simulating real workloads. Combining research and development has benefits.)
(The above is highly speculative. I am an outsider to academia and industry.)