By: Michael S (already5chosen.delete@this.yahoo.com), July 8, 2015 1:57 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 7, 2015 12:00 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmai.com) on July 7, 2015 10:49 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 7, 2015 9:23 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmai.com) on July 6, 2015 10:31 pm wrote:
> > > > And yet you yourself have (very effectively, with real data) made the argument that
> > > > cmov seldom pays on x86.
> > >
> > > Absolutely. I think cmov a often a bad idea, because it leaves those data dependencies.
> > > And because it's often a bad idea, it's probably under-utilized in some cases (and also
> > > probably over-utilized in other cases).
> >
> > Back when I was mentoring people who did a lot of DSP-ish
> > coding I saw a common pattern: There would inevitably
> > come a time when cmov/select was the right solution for a performance issue, so I would show them the
> > appropriate idioms to convince the compiler to emit it (or intrinsic, or asm directive). Most of them
> > would then go batsh*t crazy and use selects in all sorts of inappropriate places. Modern branch predictors
> > are pretty good, and over-utilization ends up being the bigger problem in my experience.
>
> Maybe the problem is when I say cmov/csel I am thinking of the (IMHO) obvious use cases.
> max/min, abs, sgn, and the sorts of very similar functions I constantly dealt with when
> writing codecs (eg parse one bit then, if (bit){motionVector=-motionVector})
> All of these strike me as PRECISELY the point of cmov/csel.
By now, max/min, abs, sgn have to be in the ISA. Not only on the SIMD/FP side, but on the scalar/integer side as well. May be, there were valid reasons for omitting them 50 or 38 or even 30 years ago. Those reasons are not valid any more.
Pay attention, that all your examples have one thing in common - small fan-in. Small fan-in together in wide applicability is a very good criterion for inclusion of particular primitive into instruction set.
Also, in all your cases all data dependencies are true dependencies. There are no false data dependencies that can be speculated around in branchy variant of the same calculation. That makes conversion of branchless to branch particularly unattractive.
> Perhaps it's my experience in this field where one CONSTANTLY has these sorts of one instruction branch-overs
> --- for clamping values, for non-linear edge smoothing, etc --- that makes me appreciate their value;
> and perhaps most people just don't encounter this sort of code in the code they write?
>
For this sort of things conditional move and/or predication is certainly a better tool than branch, but not as good a tool as special instructions.
> Now if people are idiots and want to use cmov in situation where, as you say
> ... the HW would also have to issue and execute all of the instructions on the not-predicted
> side of the branch.... [ie you should not be USING these types of instructions if we are talking
> MULTIPLE setup instructions, and MULTIPLE instructions in the branch's basic block]
> What can I say? People are morons who will abuse any tool. We don't argue for the abolition of the
> FPU because some people are so stupid that they will use FP for what should clearly be integer data.
Sometimes, when direction of branch is really unpredictable and pipeline is really long and data dependencies are either non-problematic or "true" and inevitable and the processor is wide and it (processor) does not have anything better to do these people are not morons at all even when each side of branch is close to 10 instructions. There are no hard rules except "Don't trust your intuition. Measure!".
> Patrick Chase (patrickjchase.delete@this.gmai.com) on July 7, 2015 10:49 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 7, 2015 9:23 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmai.com) on July 6, 2015 10:31 pm wrote:
> > > > And yet you yourself have (very effectively, with real data) made the argument that
> > > > cmov seldom pays on x86.
> > >
> > > Absolutely. I think cmov a often a bad idea, because it leaves those data dependencies.
> > > And because it's often a bad idea, it's probably under-utilized in some cases (and also
> > > probably over-utilized in other cases).
> >
> > Back when I was mentoring people who did a lot of DSP-ish
> > coding I saw a common pattern: There would inevitably
> > come a time when cmov/select was the right solution for a performance issue, so I would show them the
> > appropriate idioms to convince the compiler to emit it (or intrinsic, or asm directive). Most of them
> > would then go batsh*t crazy and use selects in all sorts of inappropriate places. Modern branch predictors
> > are pretty good, and over-utilization ends up being the bigger problem in my experience.
>
> Maybe the problem is when I say cmov/csel I am thinking of the (IMHO) obvious use cases.
> max/min, abs, sgn, and the sorts of very similar functions I constantly dealt with when
> writing codecs (eg parse one bit then, if (bit){motionVector=-motionVector})
> All of these strike me as PRECISELY the point of cmov/csel.
By now, max/min, abs, sgn have to be in the ISA. Not only on the SIMD/FP side, but on the scalar/integer side as well. May be, there were valid reasons for omitting them 50 or 38 or even 30 years ago. Those reasons are not valid any more.
Pay attention, that all your examples have one thing in common - small fan-in. Small fan-in together in wide applicability is a very good criterion for inclusion of particular primitive into instruction set.
Also, in all your cases all data dependencies are true dependencies. There are no false data dependencies that can be speculated around in branchy variant of the same calculation. That makes conversion of branchless to branch particularly unattractive.
> Perhaps it's my experience in this field where one CONSTANTLY has these sorts of one instruction branch-overs
> --- for clamping values, for non-linear edge smoothing, etc --- that makes me appreciate their value;
> and perhaps most people just don't encounter this sort of code in the code they write?
>
For this sort of things conditional move and/or predication is certainly a better tool than branch, but not as good a tool as special instructions.
> Now if people are idiots and want to use cmov in situation where, as you say
> ... the HW would also have to issue and execute all of the instructions on the not-predicted
> side of the branch.... [ie you should not be USING these types of instructions if we are talking
> MULTIPLE setup instructions, and MULTIPLE instructions in the branch's basic block]
> What can I say? People are morons who will abuse any tool. We don't argue for the abolition of the
> FPU because some people are so stupid that they will use FP for what should clearly be integer data.
Sometimes, when direction of branch is really unpredictable and pipeline is really long and data dependencies are either non-problematic or "true" and inevitable and the processor is wide and it (processor) does not have anything better to do these people are not morons at all even when each side of branch is close to 10 instructions. There are no hard rules except "Don't trust your intuition. Measure!".