Broadwell cmov latency

By: SHK (no.delete@this.mail.com), June 9, 2015 6:01 am
Room: Moderated Discussions
According to instlatx64 the main differences in the instructions latency from haswell to broadwell are this (from : http://users.atw.hu/instlatx64/HaswellvsBroadwell.txt LT=latency, TP=throughput)


HSW BDW
LT|TP LT|TP
(V)MULSS/SD/PS/PD 5| 1 -> 3| 1
(V)DIVSS 13| 7 -> 11| 3
(V)DIVPS xmm 13| 7 -> 11| 5
VDIVPS ymm 21| 14 -> 17| 10
(V)DIVSD 20| 14 -> 14| 4
(V)DIVPD xmm 20| 14 -> 14| 8
VDIVPD ymm 35| 25 -> 23| 16
(V)DPPS xmm/ymm 14| -> 12|
DPPD xmm 9| -> 7|
(V)PCLMULQDQ xmm 7| 2 -> 5| 1
(V)ADDQ/SUBQ xmm/ymm 1| 1 -> 1| 0.5


ADCX/ADOX 1| 1
ADC/SBB/CMOV 2| 1 1| 1


All are more or less confirmed by Intel's manual (faster fp-mul, single uop clmul, ecc) but i was unable to find any official confirmation that for now on (i hope!) cmov is a single cycle uops, and the same is true for the other 3-sources ops.

This would mean another long standing bottleneck is now gone, can anyone confirm it?
 Next Post in Thread >
TopicPosted ByDate
Broadwell cmov latencySHK2015/06/09 06:01 AM
  Broadwell cmov latencyjokerman2015/06/09 03:57 PM
    Broadwell cmov latencySHK2015/06/09 04:17 PM
  fpu almost perfectUnmaskedUnderflow2015/06/10 09:37 AM
    fpu almost perfectSHK2015/06/10 12:56 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?