By: Exophase (exophase.delete@this.gmail.com), July 7, 2015 1:26 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 7, 2015 12:00 pm wrote:
> Maybe the problem is when I say cmov/csel I am thinking of the (IMHO) obvious use cases.
> max/min, abs, sgn, and the sorts of very similar functions I constantly dealt with when
> writing codecs (eg parse one bit then, if (bit){motionVector=-motionVector})
> All of these strike me as PRECISELY the point of cmov/csel.
> Perhaps it's my experience in this field where one CONSTANTLY has these sorts of one instruction branch-overs
> --- for clamping values, for non-linear edge smoothing, etc --- that makes me appreciate their value;
> and perhaps most people just don't encounter this sort of code in the code they write?
A lot of those operations are already commonly supported directly in modern SIMD architectures. Or can be synthesized in a similar or smaller number of instructions compared to a solution with cmov or csel. For example, on ARM NEON if (bit){motionVector=-motionVector} can be computed as (on a vector of 32-bit ints):
vtst.u32 mask, bit, bit
veor.u32 motionVector, motionVector, mask
vsub.u32 motionVector, motionVector, mask
Where the equivalent with conditional select would be something like:
vtst.u32 mask, bit, bit
vneg.s32 motionVectorNeg, motionVector
vbit.u32 motionVector, motionVectorNeg, mask
Although they're the same number of ops, the former may be preferable over the latter since it uses less registers and since vbit can have lower throughput than veor and vsub on some uarchs (on the other hand, the latter can be preferable because it has a shorter critical path)
> Maybe the problem is when I say cmov/csel I am thinking of the (IMHO) obvious use cases.
> max/min, abs, sgn, and the sorts of very similar functions I constantly dealt with when
> writing codecs (eg parse one bit then, if (bit){motionVector=-motionVector})
> All of these strike me as PRECISELY the point of cmov/csel.
> Perhaps it's my experience in this field where one CONSTANTLY has these sorts of one instruction branch-overs
> --- for clamping values, for non-linear edge smoothing, etc --- that makes me appreciate their value;
> and perhaps most people just don't encounter this sort of code in the code they write?
A lot of those operations are already commonly supported directly in modern SIMD architectures. Or can be synthesized in a similar or smaller number of instructions compared to a solution with cmov or csel. For example, on ARM NEON if (bit){motionVector=-motionVector} can be computed as (on a vector of 32-bit ints):
vtst.u32 mask, bit, bit
veor.u32 motionVector, motionVector, mask
vsub.u32 motionVector, motionVector, mask
Where the equivalent with conditional select would be something like:
vtst.u32 mask, bit, bit
vneg.s32 motionVectorNeg, motionVector
vbit.u32 motionVector, motionVectorNeg, mask
Although they're the same number of ops, the former may be preferable over the latter since it uses less registers and since vbit can have lower throughput than veor and vsub on some uarchs (on the other hand, the latter can be preferable because it has a shorter critical path)