By: foobar (a.delete@this.b.c), August 30, 2014 9:26 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 29, 2014 3:49 pm wrote:
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 29, 2014 7:35 am wrote:
> > Howard Chu (hyc.delete@this.symas.com) on August 28, 2014 9:17 pm wrote:
> > > Pretty good comparison of x86, PPC, and ARM here http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
> > >
> >
> > That's a pretty excellent example of what I was talking about.
>
> While I don't disagree that you could always find these kinds of cases --
> exactly because the different ISA constraints allow optimization effort to
> be allocated differently -- I'm not *entirely* happy with the numbers.
>
> I mean, they're interesting for what they are, but firstly, that powerpc core is an old core.
> It's a pentium4-era core. Now the pentium4 cores would probably do reasonably well on this test
> too, but nobody would say they don't have horrible performance cases as well, dispite being strongly
> ordered. Try an atomic operation and it would probably take hundreds of cycles.
>
> Secondly, compare; branch; isync is not the nicest way to implement read-read ordering on powerpc. isync
> is not simply memory barrier so much as a hammer.
For the PowerPC 4xx processors, isync flushes the shadow TLBs as well. I bet it does the same thing for the ERATs in Power8.
> Aaron Spink (aaronspink.delete@this.notearthlink.net) on August 29, 2014 7:35 am wrote:
> > Howard Chu (hyc.delete@this.symas.com) on August 28, 2014 9:17 pm wrote:
> > > Pretty good comparison of x86, PPC, and ARM here http://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cpp11/
> > >
> >
> > That's a pretty excellent example of what I was talking about.
>
> While I don't disagree that you could always find these kinds of cases --
> exactly because the different ISA constraints allow optimization effort to
> be allocated differently -- I'm not *entirely* happy with the numbers.
>
> I mean, they're interesting for what they are, but firstly, that powerpc core is an old core.
> It's a pentium4-era core. Now the pentium4 cores would probably do reasonably well on this test
> too, but nobody would say they don't have horrible performance cases as well, dispite being strongly
> ordered. Try an atomic operation and it would probably take hundreds of cycles.
>
> Secondly, compare; branch; isync is not the nicest way to implement read-read ordering on powerpc. isync
> is not simply memory barrier so much as a hammer.
For the PowerPC 4xx processors, isync flushes the shadow TLBs as well. I bet it does the same thing for the ERATs in Power8.