By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 10, 2014 11:56 am
Room: Moderated Discussions
Konrad Schwarz (no.spam.delete@this.no.spam) on December 10, 2014 6:23 am wrote:
>
> And this really clinches the argument: to use reference counts, you need acyclic (directed) graphs.
Note that for cyclic graphs, you'd still often use atomics, just not generally reference counts (or quite possibly in addition to reference counts - with the reference count being used to track usage separate from the internal graph usage).
Think of the usual trivial traversal model: walk the graph, do something to each entry, and set a flag that you've done it (exactly because of cycles). Now, think about what happens in threaded environments. Yeah, that "do something and set a flag" is suddenly very much racy.
And for exactly the same reason as for reference counts, using an external lock is problematic. So you tend to want to have lockless models - ie atomics. Instead of an atomic increment, it might be an atomic compare-and-exchange ("set state to 'in progress', but only if nobody else already did so).
(Combine with "use sequence number instead of simple flag", in order to avoid things like "have to do a second traversal just to clear the flag again to do the next round")
And to make things more interesting still, it's not usually a "one or the other" situation, it's often a combination of things. I already mentioned using reference counts in addition to that whole "have I handled this already" flag/sequence count. But you then generally have other models too, including straight traditional locking on a higher level (maybe you want to traverse the graph concurrently, but you do not want to have multiple different traversals active at the same time).
And you can do various other fancy tricks. In Linux, we introduced this notion of a "reflock" that is both a reference count and a lock, and has very specific semantic guarantees (reference counts can be incremented atomically without taking the lock, but only if nobody else holds the lock etc).
And yes, you can often hide this from the user. A garbage collector is obviously doing exactly that kind of graph traversal (and often uses sequence counts not just to avoid that dual traversal, but for more fundamental reasons: to avoid a certain class of races you may well want to delay actual freeing, and use the generational data of the sequence count to free things only after they haven't been reachable for two or more cycles).
Of course, once you get into more complex situations, you do end up often effectively wanting LL/SC rather than just a simple unconditional atomic RMW instruction. So I'm absolutely not against the LL/SC model, I just think that you do want both, and I think having ARM64 add atomic rmw operations is actually a good thing.
Linus
>
> And this really clinches the argument: to use reference counts, you need acyclic (directed) graphs.
Note that for cyclic graphs, you'd still often use atomics, just not generally reference counts (or quite possibly in addition to reference counts - with the reference count being used to track usage separate from the internal graph usage).
Think of the usual trivial traversal model: walk the graph, do something to each entry, and set a flag that you've done it (exactly because of cycles). Now, think about what happens in threaded environments. Yeah, that "do something and set a flag" is suddenly very much racy.
And for exactly the same reason as for reference counts, using an external lock is problematic. So you tend to want to have lockless models - ie atomics. Instead of an atomic increment, it might be an atomic compare-and-exchange ("set state to 'in progress', but only if nobody else already did so).
(Combine with "use sequence number instead of simple flag", in order to avoid things like "have to do a second traversal just to clear the flag again to do the next round")
And to make things more interesting still, it's not usually a "one or the other" situation, it's often a combination of things. I already mentioned using reference counts in addition to that whole "have I handled this already" flag/sequence count. But you then generally have other models too, including straight traditional locking on a higher level (maybe you want to traverse the graph concurrently, but you do not want to have multiple different traversals active at the same time).
And you can do various other fancy tricks. In Linux, we introduced this notion of a "reflock" that is both a reference count and a lock, and has very specific semantic guarantees (reference counts can be incremented atomically without taking the lock, but only if nobody else holds the lock etc).
And yes, you can often hide this from the user. A garbage collector is obviously doing exactly that kind of graph traversal (and often uses sequence counts not just to avoid that dual traversal, but for more fundamental reasons: to avoid a certain class of races you may well want to delay actual freeing, and use the generational data of the sequence count to free things only after they haven't been reachable for two or more cycles).
Of course, once you get into more complex situations, you do end up often effectively wanting LL/SC rather than just a simple unconditional atomic RMW instruction. So I'm absolutely not against the LL/SC model, I just think that you do want both, and I think having ARM64 add atomic rmw operations is actually a good thing.
Linus