By: Klimax (danklima.delete@this.gmail.com), August 10, 2014 8:48 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 9, 2014 2:37 pm wrote:
> Klimax (danklima.delete@this.gmail.com) on August 9, 2014 2:10 pm wrote:
>
> >
> > It seems per Optimization manual that there are no longer significant
> > performance differences between x87 instructions and scalar SSEx.
> >
> > (latency/throughput)
> > FADD 3/1 vs. ADDSS (SSE1) 3/1
> > FMUL 5/2 MULSS 5/1
> >
>
> The [scalar] x87 tax is not in instructions latency/throughput and never was (in fact,
> on P4 x87 FADD throughput was better than scalar SSE2 throughput). The tax is
> 1) in need to to use more regmoves or exchanges to achieve the same result.
> And no, despite what opt. manual may claim, they are never 100% free
> 2) in sw-visible register starvation. 8 visible registers is often enough for GPRs, but rarely enough
> for inner FP loops on wide machine. That's true even on SB/IB which theoretically can do 2 loads per clock,
> but even more so on previous Intel Core CPUs that only, despite having the same scalar FPU width as SB,
> can only do 1 load per clock. Of course, the same problem applies to 32-bit SSE/AVX as well.
> 3) in hard to understand but very real fact that after 35 years of trying the 2 popular compilers,
> i.e. MSVC and gcc, still suck in x87 register allocation and associated stuff. I still can realatively
> easily beat either of the two in x87. Of course, sometimes I can beat them in [scalsr] 32-bit SSX/AVX
> or even (much much rarer) in 64-bit SSE/AVX, but never by the same margin as in x87.
> 4) x87 also sucks in more rare but not really exotic areas as fp-integer conversions
> and moving data to/from GPRs. Original x87 also sucked in delivery of condition
> codes to main execution engine, but that was fixed ~20 years ago.
>
>
> > Note: At least Visual Studio will use by default SSEx scalar instructions
> > for x64 and when arch:SSE or higher enabled (or when targeting Vista+)
> >
> > Since 2010 IIRC.
>
> As far as recollect, my copy of VS2010 at work can't generate x87 code on x64 at all, not just by default.
>
>
For first part. OK. Might be. Never got yet to do any comparison. Although some did, but IIRC it was on older then SB CPUs)
Interesting, just tested it. Can't force x87 for x64.
> Klimax (danklima.delete@this.gmail.com) on August 9, 2014 2:10 pm wrote:
>
> >
> > It seems per Optimization manual that there are no longer significant
> > performance differences between x87 instructions and scalar SSEx.
> >
> > (latency/throughput)
> > FADD 3/1 vs. ADDSS (SSE1) 3/1
> > FMUL 5/2 MULSS 5/1
> >
>
> The [scalar] x87 tax is not in instructions latency/throughput and never was (in fact,
> on P4 x87 FADD throughput was better than scalar SSE2 throughput). The tax is
> 1) in need to to use more regmoves or exchanges to achieve the same result.
> And no, despite what opt. manual may claim, they are never 100% free
> 2) in sw-visible register starvation. 8 visible registers is often enough for GPRs, but rarely enough
> for inner FP loops on wide machine. That's true even on SB/IB which theoretically can do 2 loads per clock,
> but even more so on previous Intel Core CPUs that only, despite having the same scalar FPU width as SB,
> can only do 1 load per clock. Of course, the same problem applies to 32-bit SSE/AVX as well.
> 3) in hard to understand but very real fact that after 35 years of trying the 2 popular compilers,
> i.e. MSVC and gcc, still suck in x87 register allocation and associated stuff. I still can realatively
> easily beat either of the two in x87. Of course, sometimes I can beat them in [scalsr] 32-bit SSX/AVX
> or even (much much rarer) in 64-bit SSE/AVX, but never by the same margin as in x87.
> 4) x87 also sucks in more rare but not really exotic areas as fp-integer conversions
> and moving data to/from GPRs. Original x87 also sucked in delivery of condition
> codes to main execution engine, but that was fixed ~20 years ago.
>
>
> > Note: At least Visual Studio will use by default SSEx scalar instructions
> > for x64 and when arch:SSE or higher enabled (or when targeting Vista+)
> >
> > Since 2010 IIRC.
>
> As far as recollect, my copy of VS2010 at work can't generate x87 code on x64 at all, not just by default.
>
>
For first part. OK. Might be. Never got yet to do any comparison. Although some did, but IIRC it was on older then SB CPUs)
Interesting, just tested it. Can't force x87 for x64.