By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 15, 2015 1:55 pm
Room: Moderated Discussions
⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on July 15, 2015 12:54 pm wrote:
>
> One only needs the simple LOCK CAS (aka CMPXCHG) instruction in SMP with shared memory:
>
> LOCK CAS reg, mem
That's like saying "your instruction set only needs to be turing complete".
That is "true", but it's true only in the weakest sense. It's a theoretical argument that is completely worthless in the real world.
CAS is kind of equivalent to the turing completeness. Yes, you can in theory get everything done with CAS. But in reality you cannot do things like lockless algorithms (you need proper memory ordering semantics for those), and you cannot do certain things efficiently.
CMPXCHG is better than CAS, but it too isn't a replacement for all the other operations you need in practice. The same way you want a rich and complete instruction set (rather than "turing completeness"), you want a fairly rich set of ordered atomic accesses.
Just as an example: the new atomics in ARM64 are a good addition, and allows you to do things like updating counters etc without having to do the ldrex/strex loop. Sure, you *could* do atomic counters with a lock, but that's horrible in practice.
Linus
>
> One only needs the simple LOCK CAS (aka CMPXCHG) instruction in SMP with shared memory:
>
> LOCK CAS reg, mem
That's like saying "your instruction set only needs to be turing complete".
That is "true", but it's true only in the weakest sense. It's a theoretical argument that is completely worthless in the real world.
CAS is kind of equivalent to the turing completeness. Yes, you can in theory get everything done with CAS. But in reality you cannot do things like lockless algorithms (you need proper memory ordering semantics for those), and you cannot do certain things efficiently.
CMPXCHG is better than CAS, but it too isn't a replacement for all the other operations you need in practice. The same way you want a rich and complete instruction set (rather than "turing completeness"), you want a fairly rich set of ordered atomic accesses.
Just as an example: the new atomics in ARM64 are a good addition, and allows you to do things like updating counters etc without having to do the ldrex/strex loop. Sure, you *could* do atomic counters with a lock, but that's horrible in practice.
Linus