By: Michael S (already5chosen.delete@this.yahoo.com), November 5, 2006 3:47 pm
Room: Moderated Discussions
Eric Bron (eric.bron@adeptdevelopment.com) on 11/5/06 wrote:
---------------------------
>>case, and yes, they used to suck horribly.
>
>I remember my early tests with a Katmai PIII and unaligned moves were very slow
>indeed. Now I have just tested again with a simple benchmark, with somewhat suprising results :
>
>
>
>
>the unaligned path is only marginaly slower (<5%) on both CPUs even if it's more
>complex since reg,mem addressing can't be used
>
>
I don't understand. Do you mean 43ns/iter on C2D? Why so slow?
---------------------------
>>case, and yes, they used to suck horribly.
>
>I remember my early tests with a Katmai PIII and unaligned moves were very slow
>indeed. Now I have just tested again with a simple benchmark, with somewhat suprising results :
>
>
>;;; Aligned path :
>
>
>$B16$3:
>movaps xmm0, XMMWORD PTR [edi+eax*4]
>mulps xmm0, XMMWORD PTR [esi+eax*4]
>addps xmm0, XMMWORD PTR [ecx+eax*4]
>movaps XMMWORD PTR [ebx], xmm0
>add eax, 4
>cmp eax, edx
>jl $B16$3
>
>
>;;; Unaligned path :
>
>$B17$3:
>movups xmm2, XMMWORD PTR [ecx+eax*4]
>movups xmm1, XMMWORD PTR [edi+eax*4]
>movups xmm0, XMMWORD PTR [esi+eax*4]
>mulps xmm1, xmm0
>addps xmm2, xmm1
>movups XMMWORD PTR [ebx], xmm2
>add eax, 4
>cmp eax, edx
>jl $B17$3
>
>
>Timings for 10'000 iterations, dataset size = 4*3200 bytes
>
>Core 2 Duo 1.86 GHz
>
>aligned path: 350.388 ms
>unaligned path w/aligned data: 358.263 ms
>unaligned path w/unaligned data: 367.754 ms
>
>
>Prescott 3.2 GHz
>
>aligned path: 1574.35 ms
>unaligned path w/ aligned data: 1574.79 ms
>unaligned path w/ unaligned data: 1630.74 ms
>
>
>
>
>the unaligned path is only marginaly slower (<5%) on both CPUs even if it's more
>complex since reg,mem addressing can't be used
>
>
I don't understand. Do you mean 43ns/iter on C2D? Why so slow?