By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 5, 2014 12:58 pm
Room: Moderated Discussions
Konrad Schwarz (no.spam.delete@this.no.spam) on December 4, 2014 11:08 pm wrote:
>
> Since it is being accessed in a multi-threaded way, via multiple access paths,
> generally it needs its own mutex -- otherwise, reference counting would not
> be required to be atomic and a lock of a higher-level object would suffice
The problem with reference counts is that you often need to take them *before* you take the lock that protects the object data.
The thing is, you have two different cases:
- object *reference*
- object data
and they have completely different locking.
Object data locking is generally per-object. Well, unless you don't have huge scalability issues, in which case you may have some external bigger lock (extreme case: one single global lock).
But object *referencing* is mostly about finding the object (and removing/freeing it). Is it on a hash chain? Is it in a tree? Linked list? When the reference count goes down to zero, it's not the object data that you need to protect (the object is not used by anything else, so there's nothing to protect!), it's the ways to find the object you need to protect.
And the lock for the lookup operation cannot be in the object, because - by definition - you don't know what the object is! You're trying to look it up, after all.
So generally you have a lock that protects the lookup operation some way, and the reference count needs to be atomic with respect to that lock.
And yes, that lock may well be sufficient, and now you're back to non-atomic reference counts. But you usually don't have just one way to look things up: you might have pointers from other objects (and that pointer is protected by the object locking of the other object), but there may be multiple such objects that point to this (which is why you have a reference count in the first place!).
See what happens? There is no longer one single lock for lookup. Imagine walking a graph of objects, where objects have pointers to each other. Each pointer implies a reference to an object, but as you walk the graph, you have to release the lock from the source object, so you have to take a new reference to the object you are now entering.
And in order to avoid deadlocks, you can not in the general case take the lock of the new object first - you have to release the lock on the source object, because otherwise (in a complex graph), how do you avoid simple ABBA deadlock?
So atomic reference counts fix that. They work because when you move from object A to object B, you can do this:
(a) you have a reference count to A, and you can lock A
(b) once object A is locked, the pointer from A to B is stable, and you know you have a reference to B (because of that pointer from A to B)
(c) but you cannot take the object lock for B (ABBA deadlock) while holding the lock on A
(d) increment the atomic reference count on B
(e) now you can drop the lock on A (you're "exiting" A)
(f) your reference count means that B cannot go away from under you despite unlocking A, so now you can lock B.
See? Atomic reference counts make this kind of situation possible. Yes, you want to avoid the overhead if at all possible (for example, maybe you have a strict ordering of objects, so you know you can walk from A to B, and never walk from B to A, so there is no ABBA deadlock, and you can just lock B while still holding the lock on A).
But if you don't have some kind of forced ordering, and if you have multiple ways to reach an object (and again - why have reference counts in the first place if that isn't true!) then atomic reference counts really are the simple and sane answer.
If you think atomic refcounts are unnecessary, that's a big flag that you don't actually understand the complexities of locking.
Trust me, concurrency is hard. There's a reason all the examples of "look how easy it is to parallellize things" tend to use simple arrays and don't ever have allocations or freeing of the objects.
People who think that the future is highly parallel are invariably completely unaware of just how hard concurrency really is. They've seen Linpack, they've seen all those wonderful examples of sorting an array in parallel, they've seen all these things that have absolutely no actual real complexity - and often very limited real usefulness.
Linus
>
> Since it is being accessed in a multi-threaded way, via multiple access paths,
> generally it needs its own mutex -- otherwise, reference counting would not
> be required to be atomic and a lock of a higher-level object would suffice
The problem with reference counts is that you often need to take them *before* you take the lock that protects the object data.
The thing is, you have two different cases:
- object *reference*
- object data
and they have completely different locking.
Object data locking is generally per-object. Well, unless you don't have huge scalability issues, in which case you may have some external bigger lock (extreme case: one single global lock).
But object *referencing* is mostly about finding the object (and removing/freeing it). Is it on a hash chain? Is it in a tree? Linked list? When the reference count goes down to zero, it's not the object data that you need to protect (the object is not used by anything else, so there's nothing to protect!), it's the ways to find the object you need to protect.
And the lock for the lookup operation cannot be in the object, because - by definition - you don't know what the object is! You're trying to look it up, after all.
So generally you have a lock that protects the lookup operation some way, and the reference count needs to be atomic with respect to that lock.
And yes, that lock may well be sufficient, and now you're back to non-atomic reference counts. But you usually don't have just one way to look things up: you might have pointers from other objects (and that pointer is protected by the object locking of the other object), but there may be multiple such objects that point to this (which is why you have a reference count in the first place!).
See what happens? There is no longer one single lock for lookup. Imagine walking a graph of objects, where objects have pointers to each other. Each pointer implies a reference to an object, but as you walk the graph, you have to release the lock from the source object, so you have to take a new reference to the object you are now entering.
And in order to avoid deadlocks, you can not in the general case take the lock of the new object first - you have to release the lock on the source object, because otherwise (in a complex graph), how do you avoid simple ABBA deadlock?
So atomic reference counts fix that. They work because when you move from object A to object B, you can do this:
(a) you have a reference count to A, and you can lock A
(b) once object A is locked, the pointer from A to B is stable, and you know you have a reference to B (because of that pointer from A to B)
(c) but you cannot take the object lock for B (ABBA deadlock) while holding the lock on A
(d) increment the atomic reference count on B
(e) now you can drop the lock on A (you're "exiting" A)
(f) your reference count means that B cannot go away from under you despite unlocking A, so now you can lock B.
See? Atomic reference counts make this kind of situation possible. Yes, you want to avoid the overhead if at all possible (for example, maybe you have a strict ordering of objects, so you know you can walk from A to B, and never walk from B to A, so there is no ABBA deadlock, and you can just lock B while still holding the lock on A).
But if you don't have some kind of forced ordering, and if you have multiple ways to reach an object (and again - why have reference counts in the first place if that isn't true!) then atomic reference counts really are the simple and sane answer.
If you think atomic refcounts are unnecessary, that's a big flag that you don't actually understand the complexities of locking.
Trust me, concurrency is hard. There's a reason all the examples of "look how easy it is to parallellize things" tend to use simple arrays and don't ever have allocations or freeing of the objects.
People who think that the future is highly parallel are invariably completely unaware of just how hard concurrency really is. They've seen Linpack, they've seen all those wonderful examples of sorting an array in parallel, they've seen all these things that have absolutely no actual real complexity - and often very limited real usefulness.
Linus