By: anon (anon.delete@this.anon.com), July 6, 2015 11:02 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmai.com) on July 6, 2015 10:31 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 6, 2015 3:59 pm wrote:
> > x86 doesn't have those insane memory ordering semantics. Loads are done in order (as far
> > as software could tell - they do get re-ordered, but the semantics are guaranteed to be
> > the same as if they were done in order), so it doesn't matter if the two accessed had a
> > data or control dependency between them.
>
> And yet you yourself have (very effectively, with real data) made the argument that cmov seldom pays on x86.
>
Well Intel puts effort into improving cmov performance on its CPUs (although it did seem to go from 1->2 cycle latency when doubling throughput, I still consider that an improvement).
If more dynamically unpredictable branches can be turned into cmov, optimal speculation depth can increase and/or branch prediction resources can be reduced (relatively). Data dependency is preferable to significant amount of mispredicts.
ARM, IBM have not given up on predication in their ISAs or their aggressive OOO microarchitectures either.
And it does not have to be *totally* unpredictable (i.e., completely random, approaching 50% mispredict rate). There is an interesting post here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073#c16
Not that it is some magical device which removes the problem of branch mispredicts, but it's a useful tool for OOO processors. Of course like most low level optimizations, the rule applies, "if you don't know what you're doing, and you aren't willing to profile and measure, then don't do it."
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 6, 2015 3:59 pm wrote:
> > x86 doesn't have those insane memory ordering semantics. Loads are done in order (as far
> > as software could tell - they do get re-ordered, but the semantics are guaranteed to be
> > the same as if they were done in order), so it doesn't matter if the two accessed had a
> > data or control dependency between them.
>
> And yet you yourself have (very effectively, with real data) made the argument that cmov seldom pays on x86.
>
Well Intel puts effort into improving cmov performance on its CPUs (although it did seem to go from 1->2 cycle latency when doubling throughput, I still consider that an improvement).
If more dynamically unpredictable branches can be turned into cmov, optimal speculation depth can increase and/or branch prediction resources can be reduced (relatively). Data dependency is preferable to significant amount of mispredicts.
ARM, IBM have not given up on predication in their ISAs or their aggressive OOO microarchitectures either.
And it does not have to be *totally* unpredictable (i.e., completely random, approaching 50% mispredict rate). There is an interesting post here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073#c16
Not that it is some magical device which removes the problem of branch mispredicts, but it's a useful tool for OOO processors. Of course like most low level optimizations, the rule applies, "if you don't know what you're doing, and you aren't willing to profile and measure, then don't do it."