By: S. Rao (sonny.delete@this.burdell.org), October 23, 2006 3:40 pm
Room: Moderated Discussions
Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/22/06 wrote:
---------------------------
>S. Rao (sonny@burdell.org) on 10/21/06 wrote:
>---------------------------
>>>You are right. If the compiler can determine that the object is not globally visible,
>>>it will probably eliminate the allocation completely by replacing it with a stack
>>>allocation, so there actually won't be any allocation.
>>
>>Well, it doesn't even need to go that far, it just needs
>>to add a barrier after a constructor and before any
>>reference to the newly initialized object escapes
>>the local scope (if there isn't an implicit barrier already)
>>
>
>But that is exactly the problem that I am complaining about. That this barrier
>has to be added for practically every allocation, even if most don't need it.
I guess I'm somewhat confused, are you sure that this
barrier is always added in current implementations
or is it a conjecture that it is always added because
there isn't a better way to guarantee consistency?
All I'm saying is that the implementation might have
enough info to be smart about when it's added or not,
I'm not sure what has been implemented in reality,
but I'd be surprised if this wasn't optimized somehow.
Thinking about it more, since Java constructors are all
chained, we would end up with several barriers for
every object creation... this seems like it would be
incredibly slow -- contradicting your assertion below that
object creation must be as fast as a stack alloc.
>>If it's locally scoped and the creating thread interacts
>>with the object before its reference escapes, then the
>>barrier isn't needed at all (mostly).
>
>I don't really see how. Having the local thread interact with an object does nothing
>for the visibility of this object from other threads. When does a write become globally
>visible - after a 100 instructions ? We still end up needing a memory barrier in
>all cases when the compiler cannot prove that an object is used only locally (which is practically all cases).
Well, having the local thread interact with the object
means that proper ordering is forced, since that thread's
code must see everything in a consistent state.
The creating thread's loads will always get the correct
data without an explicit barrier (maybe not on Alpha ?,
but everywhere else). If that happens before the creating
thread allows the reference to escape, I think we're okay
to proceed without adding an explicit barrier.
>>Even further, object creation is normally an expensive
>>process anyway, the barrier might be cheap enough that
>>it amounts to noise in the amount of time it takes
>>to fully initialize an object. So who knows, maybe
>>they just always do it. If I can get ahold of a JVM
>>guy, maybe I can find out what they do on PPC.
>
>Again, I disagree. Object creation is supposed to be an extremely cheap operation,
>ideally comparable in cost to literally pushing the object onto the stack. In a
>single threaded environment with a compacting GC this is an attainable goal.
I've never heard this before.. I've always thought
object allocation was an expensive proposition, how
could it possibly be as cheap as a simple stack allocation,
and why would Java have primitive types if that were true
(maybe a big can of worms here :-))?
>I have never seen a good quantitive explanation of how expensive (and when) a memory
>barrier really is in practice. Is it more expensive on a ccNUMA system, for example
>? Does it matter at all ? Perhaps I am making all this noise for nothing.
Well, of course, the answer is it depends, I can talk about
Power5 because that's what I'm most familiar with.
On Power5 the lwsync instruction doesn't go out on
the fabric where the full heavyweight sync does. For this
application, lwsync would be sufficient.
---------------------------
>S. Rao (sonny@burdell.org) on 10/21/06 wrote:
>---------------------------
>>>You are right. If the compiler can determine that the object is not globally visible,
>>>it will probably eliminate the allocation completely by replacing it with a stack
>>>allocation, so there actually won't be any allocation.
>>
>>Well, it doesn't even need to go that far, it just needs
>>to add a barrier after a constructor and before any
>>reference to the newly initialized object escapes
>>the local scope (if there isn't an implicit barrier already)
>>
>
>But that is exactly the problem that I am complaining about. That this barrier
>has to be added for practically every allocation, even if most don't need it.
I guess I'm somewhat confused, are you sure that this
barrier is always added in current implementations
or is it a conjecture that it is always added because
there isn't a better way to guarantee consistency?
All I'm saying is that the implementation might have
enough info to be smart about when it's added or not,
I'm not sure what has been implemented in reality,
but I'd be surprised if this wasn't optimized somehow.
Thinking about it more, since Java constructors are all
chained, we would end up with several barriers for
every object creation... this seems like it would be
incredibly slow -- contradicting your assertion below that
object creation must be as fast as a stack alloc.
>>If it's locally scoped and the creating thread interacts
>>with the object before its reference escapes, then the
>>barrier isn't needed at all (mostly).
>
>I don't really see how. Having the local thread interact with an object does nothing
>for the visibility of this object from other threads. When does a write become globally
>visible - after a 100 instructions ? We still end up needing a memory barrier in
>all cases when the compiler cannot prove that an object is used only locally (which is practically all cases).
Well, having the local thread interact with the object
means that proper ordering is forced, since that thread's
code must see everything in a consistent state.
The creating thread's loads will always get the correct
data without an explicit barrier (maybe not on Alpha ?,
but everywhere else). If that happens before the creating
thread allows the reference to escape, I think we're okay
to proceed without adding an explicit barrier.
>>Even further, object creation is normally an expensive
>>process anyway, the barrier might be cheap enough that
>>it amounts to noise in the amount of time it takes
>>to fully initialize an object. So who knows, maybe
>>they just always do it. If I can get ahold of a JVM
>>guy, maybe I can find out what they do on PPC.
>
>Again, I disagree. Object creation is supposed to be an extremely cheap operation,
>ideally comparable in cost to literally pushing the object onto the stack. In a
>single threaded environment with a compacting GC this is an attainable goal.
I've never heard this before.. I've always thought
object allocation was an expensive proposition, how
could it possibly be as cheap as a simple stack allocation,
and why would Java have primitive types if that were true
(maybe a big can of worms here :-))?
>I have never seen a good quantitive explanation of how expensive (and when) a memory
>barrier really is in practice. Is it more expensive on a ccNUMA system, for example
>? Does it matter at all ? Perhaps I am making all this noise for nothing.
Well, of course, the answer is it depends, I can talk about
Power5 because that's what I'm most familiar with.
On Power5 the lwsync instruction doesn't go out on
the fabric where the full heavyweight sync does. For this
application, lwsync would be sufficient.