By: David W (david.delete@this.wragg.org), December 8, 2014 11:36 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 5, 2014 12:58 pm wrote:
> The problem with reference counts is that you often need to take
> them *before* you take the lock that protects the object data.
>
> The thing is, you have two different cases:
>
> - object *reference*
>
> - object data
>
> and they have completely different locking.
This is certainly true in multithreaded C/C++ code.
In languages with automatic GC (e.g. anything based on the JVM), the question of object reference locking goes away: If a thread holds an object reference, it knows it can safely dereference it at any time. So in my experience, although concurrent programming with a GC is still hard, it is much less hard than concurrent programming without a GC.
Of course, there is no free lunch: if you want your multithreaded GC-reliant program to scale, you need a scalable GC. Concurrent GCs are very complicated and introduce overheads. The higher the level of concurrency in the GC (and between the GC and the app code), the higher the overheads (both through read/write barriers in the app code and higher costs in the GC itself). So attempts to write highly scalable apps in GCed languages can lead to contortions to reduce pressure on the GC. You can even find articles about ways to write high-performance concurrent Java apps by storing data outside of the GCed heap.
Some techniques to manage object reference locking in non-GC languages resemble GC techniques. Reference counting is obviously a form of GC. RCU can also be seen as a kind of GC, extending object lifetimes until it is known that no references remain (and furthermore using immutability to avoid object data locking).
> The problem with reference counts is that you often need to take
> them *before* you take the lock that protects the object data.
>
> The thing is, you have two different cases:
>
> - object *reference*
>
> - object data
>
> and they have completely different locking.
This is certainly true in multithreaded C/C++ code.
In languages with automatic GC (e.g. anything based on the JVM), the question of object reference locking goes away: If a thread holds an object reference, it knows it can safely dereference it at any time. So in my experience, although concurrent programming with a GC is still hard, it is much less hard than concurrent programming without a GC.
Of course, there is no free lunch: if you want your multithreaded GC-reliant program to scale, you need a scalable GC. Concurrent GCs are very complicated and introduce overheads. The higher the level of concurrency in the GC (and between the GC and the app code), the higher the overheads (both through read/write barriers in the app code and higher costs in the GC itself). So attempts to write highly scalable apps in GCed languages can lead to contortions to reduce pressure on the GC. You can even find articles about ways to write high-performance concurrent Java apps by storing data outside of the GCed heap.
Some techniques to manage object reference locking in non-GC languages resemble GC techniques. Reference counting is obviously a form of GC. RCU can also be seen as a kind of GC, extending object lifetimes until it is known that no references remain (and furthermore using immutability to avoid object data locking).