By: Etienne (etienne_lorrain.delete@this.yahoo.fr), July 17, 2015 1:15 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 13, 2015 2:10 pm wrote:
> I'm not competent to discuss the technical issues, but if the matter is as cut-and-dried
> as you claim, why does it continue? There is nothing to stop ARM saying "part of ARM v8.1a
> is a new TSO memory model". Old code would still work (with the mem barriers appropriately
> NOP'd or close to), and new code would have no (or fewer and weaker) mem barriers.
>
> My argument about this is sociological not technical. I'm just not convinced that "everyone is a
> doodyhead except me" is that likely. I saw what you read about "ARM's decision made sense given where
> they were coming from", and again, that just does not strike me as a compelling argument.
>
> I would have thought the issue is design and validation; but David claimed Andy Glew said these
> were equal for ARM and x86. Seems unlikely to me, but Glew knows a hell of a lot more than me.
> So if design and validation aren't any easier, I imagine ARM must believe that weak ordering gives them
> either lower power and/or more scalability. I just don't buy that they're doing it because they all thought
> "we have to go for ideological purity over the dirty pragmatism of those Intel idiots" or "we will never
> in our lives design a CPU more complicated than the A57, so let's optimize the ISA spec for that".
>
Maybe it is all a question of optimisation, and which kind of load you want to support on your CPU?
If you just want to handle completely separate loads/tasks on each CPU core, then the synchronisation has just to be done at the O.S. level (for the few semaphores managed at that level, then even the GCC inline atomic functions may be sufficient - Linux would use RCU but then they know what they are doing).
If you want to support cooperative loads/tasks (i.e. threading on multiple CPU), then you should choose a CPU which supports it correctly, where you can have data coherency without stopping/delaying the other CPU cores.
After all, there are quite a few "memory managed"/interpreted/JIT languages which cannot handle running in parallel on different cores - if you are using those you may not need inter-core or inter-processor causality memory rules...
> I'm not competent to discuss the technical issues, but if the matter is as cut-and-dried
> as you claim, why does it continue? There is nothing to stop ARM saying "part of ARM v8.1a
> is a new TSO memory model". Old code would still work (with the mem barriers appropriately
> NOP'd or close to), and new code would have no (or fewer and weaker) mem barriers.
>
> My argument about this is sociological not technical. I'm just not convinced that "everyone is a
> doodyhead except me" is that likely. I saw what you read about "ARM's decision made sense given where
> they were coming from", and again, that just does not strike me as a compelling argument.
>
> I would have thought the issue is design and validation; but David claimed Andy Glew said these
> were equal for ARM and x86. Seems unlikely to me, but Glew knows a hell of a lot more than me.
> So if design and validation aren't any easier, I imagine ARM must believe that weak ordering gives them
> either lower power and/or more scalability. I just don't buy that they're doing it because they all thought
> "we have to go for ideological purity over the dirty pragmatism of those Intel idiots" or "we will never
> in our lives design a CPU more complicated than the A57, so let's optimize the ISA spec for that".
>
Maybe it is all a question of optimisation, and which kind of load you want to support on your CPU?
If you just want to handle completely separate loads/tasks on each CPU core, then the synchronisation has just to be done at the O.S. level (for the few semaphores managed at that level, then even the GCC inline atomic functions may be sufficient - Linux would use RCU but then they know what they are doing).
If you want to support cooperative loads/tasks (i.e. threading on multiple CPU), then you should choose a CPU which supports it correctly, where you can have data coherency without stopping/delaying the other CPU cores.
After all, there are quite a few "memory managed"/interpreted/JIT languages which cannot handle running in parallel on different cores - if you are using those you may not need inter-core or inter-processor causality memory rules...