Article: AMD's Mobile Strategy
By: EduardoS (no.delete@this.spam.com), December 17, 2011 1:32 pm
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 12/17/11 wrote:
---------------------------
>2) sets to value of (0 | 1), when (0 |-1) is generally more useful.
But C define the result of a comparison as 0 or 1 instead of 0 or -1 and who designs the instruction set/c compiler at Intel at that time was a really lazy guy, in SSE where ANDing the result of a comparison is not just nice to have but a must have the comparison instructions results in 0 or -1.
Ironically OpenCL defines the result of a comparison as 01 or 1 for scalar datatypes and 0 or -1 for vector datatypes, in GPUs 1.0 for floating point math is actually more usefull than -1.0 because FMADD are 1 cycle as well as AND and is much more common to compare and sum than to subtract. Altough a "-" operator solves the issue the AMD compiler isn't smart enough to generate the best sequence of instructions when it find something like a * (b < c) + d generating the most trivial and stupid code. It also sucks OpenCL not having the same spec for scalar and vector datatypes.
---------------------------
>2) sets to value of (0 | 1), when (0 |-1) is generally more useful.
But C define the result of a comparison as 0 or 1 instead of 0 or -1 and who designs the instruction set/c compiler at Intel at that time was a really lazy guy, in SSE where ANDing the result of a comparison is not just nice to have but a must have the comparison instructions results in 0 or -1.
Ironically OpenCL defines the result of a comparison as 01 or 1 for scalar datatypes and 0 or -1 for vector datatypes, in GPUs 1.0 for floating point math is actually more usefull than -1.0 because FMADD are 1 cycle as well as AND and is much more common to compare and sum than to subtract. Altough a "-" operator solves the issue the AMD compiler isn't smart enough to generate the best sequence of instructions when it find something like a * (b < c) + d generating the most trivial and stupid code. It also sucks OpenCL not having the same spec for scalar and vector datatypes.