By: SHK (no.delete@this.mail.com), July 7, 2015 11:34 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmai.com) on July 7, 2015 10:49 am wrote:
> > (a) I think cmov on x86 has improved. It used to have pretty bad latencies, afaik
> > they've improved. So you still do have the data dependencies, but for many cases it
> > probably doesn't matter that much.
>
> Better but not free. 2 uops in Haswell/Broadwell.
>
Not on Broadwell. I asked about cmov latency on Broadwell on this forum (Broadwell CMOV latency) and jokerman found this slide Xeon D slide 8 from Xeon D presentation where cmov is finally listed as 1 uop/1cycle latency.
I cannot understand why Intel's Optimization Manual doesn't mention it, while the improvements to MULPS/PD and PCLMUL are listed. IIRC AMD had 1 cycle cmov since the K7.
> > (a) I think cmov on x86 has improved. It used to have pretty bad latencies, afaik
> > they've improved. So you still do have the data dependencies, but for many cases it
> > probably doesn't matter that much.
>
> Better but not free. 2 uops in Haswell/Broadwell.
>
Not on Broadwell. I asked about cmov latency on Broadwell on this forum (Broadwell CMOV latency) and jokerman found this slide Xeon D slide 8 from Xeon D presentation where cmov is finally listed as 1 uop/1cycle latency.
I cannot understand why Intel's Optimization Manual doesn't mention it, while the improvements to MULPS/PD and PCLMUL are listed. IIRC AMD had 1 cycle cmov since the K7.