By: Ian Ameline (ian.ameline.delete@this.autodesk.com), November 6, 2006 9:00 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 11/6/06 wrote:
---------------------------
>x8 unrolling is too aggressive for my taste. x4 should be just fine.
>
>On Prescott you could improve (unaligned path + unaligned data) score by replacing
>movups with lddqu. On C2D it probably wouldn't improve the result but wouldn't hurt either.
>
>Pushing stores down the pipe could somewhat improve Prescott (as well as PM and
>K8) aligned path score without affecting C2D.
>
This just shows that if you don't have *everything* aligned, you're often significantly better off just letting the (Intel) compiler generate scalar FP code.
Vectorization is *hard* -- at least if you want meaningful performance gains :-)
---------------------------
>x8 unrolling is too aggressive for my taste. x4 should be just fine.
>
>On Prescott you could improve (unaligned path + unaligned data) score by replacing
>movups with lddqu. On C2D it probably wouldn't improve the result but wouldn't hurt either.
>
>Pushing stores down the pipe could somewhat improve Prescott (as well as PM and
>K8) aligned path score without affecting C2D.
>
This just shows that if you don't have *everything* aligned, you're often significantly better off just letting the (Intel) compiler generate scalar FP code.
Vectorization is *hard* -- at least if you want meaningful performance gains :-)