By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), July 11, 2013 5:03 am
Room: Moderated Discussions
none (none.delete@this.none.com) on July 11, 2013 4:51 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on July 11, 2013 4:12 am wrote:
> > none (none.delete@this.none.com) on July 11, 2013 1:49 am wrote:
> >
> > >
> > > I now wait for evidence that Geekbench favors ARM, and I somehow think
> > > nothing as obvious as what was found with AnTuTu will be uncovered.
> > >
> >
> >
> > About Geekbench, I agree with what Exophase had written on Anandtech forum - Geekbench Integer
> > scores, esp. single-threaded, look very reasonable and IMHO are far better than anything else
> > we have for cross-platform mobile benchmarking. The rest of Geekbench - not so much. I tend to
> > ignore Geekbench floating-point and to take Geekbench Memory with a solid grain of salt.
>
> I agree with you two, though I tend to add Stream Performance to single-threaded integer results.
>
> Geekbench Memory Performance results often make no sense, and FP, well it's hard to
> say what's going on, that denormal stuff Klimax mentions certainly is an issue that
> should be solved.
Which benchmark is affected by denormals? I thought pretty much any modern CPU nowadays deals with denormals in hardware with minimal penalty...
I'll just say again what I told Klimax: the Android x86 version
> uses SSE and not x87. Here are the critical loops of Dot Product, both x86 and ARM:
>
>
>
>
>
> Both codes look similarly bad :-)
Yes, GCC can still generate some inefficient code at times, especially the array accesses look bad... The Intel version is vectorized, which means the ARM version will be about twice as fast again when built with Neon. So yes, setting compiler options etc right matters...
Wilco
> Michael S (already5chosen.delete@this.yahoo.com) on July 11, 2013 4:12 am wrote:
> > none (none.delete@this.none.com) on July 11, 2013 1:49 am wrote:
> >
> > >
> > > I now wait for evidence that Geekbench favors ARM, and I somehow think
> > > nothing as obvious as what was found with AnTuTu will be uncovered.
> > >
> >
> >
> > About Geekbench, I agree with what Exophase had written on Anandtech forum - Geekbench Integer
> > scores, esp. single-threaded, look very reasonable and IMHO are far better than anything else
> > we have for cross-platform mobile benchmarking. The rest of Geekbench - not so much. I tend to
> > ignore Geekbench floating-point and to take Geekbench Memory with a solid grain of salt.
>
> I agree with you two, though I tend to add Stream Performance to single-threaded integer results.
>
> Geekbench Memory Performance results often make no sense, and FP, well it's hard to
> say what's going on, that denormal stuff Klimax mentions certainly is an issue that
> should be solved.
Which benchmark is affected by denormals? I thought pretty much any modern CPU nowadays deals with denormals in hardware with minimal penalty...
I'll just say again what I told Klimax: the Android x86 version
> uses SSE and not x87. Here are the critical loops of Dot Product, both x86 and ARM:
>
>
c54d8: f3 0f 10 0c 97 movss (%edi,%edx,4),%xmm1
> c54dd: f3 0f 59 0c 96 mulss (%esi,%edx,4),%xmm1
> c54e2: 42 inc %edx
> c54e3: f3 0f 58 c1 addss %xmm1,%xmm0
> c54e7: 3b 55 f0 cmp -0x10(%ebp),%edx
> c54ea: 75 ec jne c54d8
> c54ec: 41 inc %ecx
> c54ed: 3b 4d ec cmp -0x14(%ebp),%ecx
> c54f0: 74 0a je c54fc
> c54f2: 8b 50 28 mov 0x28(%eax),%edx
> c54f5: 89 55 f0 mov %edx,-0x10(%ebp)
> c54f8: 31 d2 xor %edx,%edx
> c54fa: eb eb jmp c54e7
>
>
a42a8: eb06 0c03 add.w ip, r6, r3
> a42ac: eddc 6a00 vldr s13, [ip]
> a42b0: eb05 0c03 add.w ip, r5, r3
> a42b4: ed9c 7a00 vldr s14, [ip]
> a42b8: ee46 7a87 vmla.f32 s15, s13, s14
> a42bc: 3101 adds r1, #1
> a42be: 3304 adds r3, #4
> a42c0: 42b9 cmp r1, r7
> a42c2: d1f1 bne.n a42a8
> a42c4: 3201 adds r2, #1
> a42c6: 42a2 cmp r2, r4
> a42c8: d003 beq.n a42d2
> a42ca: 2300 movs r3, #0
> a42cc: 6ac7 ldr r7, [r0, #44] ; 0x2c
> a42ce: 4619 mov r1, r3
> a42d0: e7f6 b.n a42c0
>
>
> Both codes look similarly bad :-)
Yes, GCC can still generate some inefficient code at times, especially the array accesses look bad... The Intel version is vectorized, which means the ARM version will be about twice as fast again when built with Neon. So yes, setting compiler options etc right matters...
Wilco