By: dmcq (dmcq.delete@this.fano.co.uk), August 31, 2014 4:54 am
Room: Moderated Discussions
foobar (a.delete@this.b.c) on August 30, 2014 9:26 pm wrote:
> anon (anon.delete@this.anon.com) on August 29, 2014 3:49 pm wrote:
> > Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 29, 2014 7:35 am wrote:
> > > Howard Chu (hyc.delete@this.symas.com) on August 28, 2014 9:17 pm wrote:
> > > > Pretty good comparison of x86, PPC, and ARM here http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
> > > >
> > >
> > > That's a pretty excellent example of what I was talking about.
> >
> > While I don't disagree that you could always find these kinds of cases --
> > exactly because the different ISA constraints allow optimization effort to
> > be allocated differently -- I'm not *entirely* happy with the numbers.
> >
> > I mean, they're interesting for what they are, but firstly, that powerpc core is an old core.
> > It's a pentium4-era core. Now the pentium4 cores would probably do reasonably well on this test
> > too, but nobody would say they don't have horrible performance cases as well, dispite being strongly
> > ordered. Try an atomic operation and it would probably take hundreds of cycles.
> >
> > Secondly, compare; branch; isync is not the nicest way to implement read-read ordering on powerpc. isync
> > is not simply memory barrier so much as a hammer.
>
> For the PowerPC 4xx processors, isync flushes the shadow TLBs
> as well. I bet it does the same thing for the ERATs in Power8.
The figures in that paper are not at all so convincing for the case if you adjust for the speed of the ARM A9 at 850Mhz and the Core i7 at 2.3GHz as given by Geekbench. The Intel machine is about 10 times as fast. Thus Instead of the enormous difference of 0.81ns compared to 16.89ns you're talking about the ARM version being twice as slow relatively. Twice as slow or fast could be explained by practically anything.
> anon (anon.delete@this.anon.com) on August 29, 2014 3:49 pm wrote:
> > Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 29, 2014 7:35 am wrote:
> > > Howard Chu (hyc.delete@this.symas.com) on August 28, 2014 9:17 pm wrote:
> > > > Pretty good comparison of x86, PPC, and ARM here http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
> > > >
> > >
> > > That's a pretty excellent example of what I was talking about.
> >
> > While I don't disagree that you could always find these kinds of cases --
> > exactly because the different ISA constraints allow optimization effort to
> > be allocated differently -- I'm not *entirely* happy with the numbers.
> >
> > I mean, they're interesting for what they are, but firstly, that powerpc core is an old core.
> > It's a pentium4-era core. Now the pentium4 cores would probably do reasonably well on this test
> > too, but nobody would say they don't have horrible performance cases as well, dispite being strongly
> > ordered. Try an atomic operation and it would probably take hundreds of cycles.
> >
> > Secondly, compare; branch; isync is not the nicest way to implement read-read ordering on powerpc. isync
> > is not simply memory barrier so much as a hammer.
>
> For the PowerPC 4xx processors, isync flushes the shadow TLBs
> as well. I bet it does the same thing for the ERATs in Power8.
The figures in that paper are not at all so convincing for the case if you adjust for the speed of the ARM A9 at 850Mhz and the Core i7 at 2.3GHz as given by Geekbench. The Intel machine is about 10 times as fast. Thus Instead of the enormous difference of 0.81ns compared to 16.89ns you're talking about the ARM version being twice as slow relatively. Twice as slow or fast could be explained by practically anything.