By: Felid (Felid.delete@this.mailinator.com), November 16, 2012 7:19 pm
Room: Moderated Discussions
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on November 16, 2012 1:24 pm wrote:
> Since you mention "EP" and from your timings it looks like you talk about scalar x87 code
Yes, FDIV with non-zero divider.
> Anyway, I was coming to my conclusion from the doubled throughput for the packed SP case
> and nearly doubled for the packed DP case (see rcp througputs below), I wasn't aware
> this is due to an improved pipelining, do you have a source to provide for this?
See AIDA64 measures: http://users.atw.hu/instlatx64/GenuineIntel00306A9_IvyBridge_InstLatX64.txt . L/T are latency and (inverse) throughput.
> Since you mention "EP" and from your timings it looks like you talk about scalar x87 code
Yes, FDIV with non-zero divider.
> Anyway, I was coming to my conclusion from the doubled throughput for the packed SP case
> and nearly doubled for the packed DP case (see rcp througputs below), I wasn't aware
> this is due to an improved pipelining, do you have a source to provide for this?
See AIDA64 measures: http://users.atw.hu/instlatx64/GenuineIntel00306A9_IvyBridge_InstLatX64.txt . L/T are latency and (inverse) throughput.



