By: Felid (Felid.delete@this.mailinator.com), November 17, 2012 1:44 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on November 17, 2012 12:50 pm wrote:
> DIV r64 is even more interesting:
>
>
> So, long integer division on IVB is not just partially pipelined,
> but they somehow managed to cut worst case latency in half.
> Looks like they now apply a different algorithm. Or, may be, just extended to 128b/64b
> an old "two bits at time" algorithm, that was in use for 64b/32b division since P5,
IVB's scalar divider is almost non-pipelined. Latency is 1-2 clocks more than throughput and very dependent on signigicant bitness (position of leftmost «1») of divider. For 32 bits it's from 26.5 (full divisin) to 9 (1/1) clocks. For 64 bits — 94.6 to 22.2.
> DIV r64 is even more interesting:
>
Latency/Reciprocal Throughput
> SNB IVB
> DIV r64 80-90/??? 35-45/23
> >
> So, long integer division on IVB is not just partially pipelined,
> but they somehow managed to cut worst case latency in half.
> Looks like they now apply a different algorithm. Or, may be, just extended to 128b/64b
> an old "two bits at time" algorithm, that was in use for 64b/32b division since P5,
IVB's scalar divider is almost non-pipelined. Latency is 1-2 clocks more than throughput and very dependent on signigicant bitness (position of leftmost «1») of divider. For 32 bits it's from 26.5 (full divisin) to 9 (1/1) clocks. For 64 bits — 94.6 to 22.2.



