By: S. Rao (sonny.delete@this.burdell.org), October 26, 2006 2:51 pm
Room: Moderated Discussions
Rob Thorpe (rthorpe@realworldtech.com) on 10/26/06 wrote:
---------------------------
>Linus Torvalds (torvalds@osdl.org) on 10/26/06 wrote:
>---------------------------
>>Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/26/06 wrote:
>>>
>>>I think the conclusion of that discussion was that weak
>>>consistency ends up being worse than both total store
>>>ordering and release consistency, with release consistency
>>>most preferable among all.
>>
>>Largely, yes.
>
>For this specific application I'd agree.
>
>>>This issue here seems somewhat orthogonal - that it is
>>>preferable to have an implicit barrier between dependent
>>>reads. I wonder is the implicit barrier cheaper than an
>>>explicit one, and why ?
>>
>>There's a non-technical but very important reason why
>>implicit barriers are better than explicit ones (and one
>>reason why I actually think that in practice the
>>x86 memory ordering model would tend to be better than
>>even a full release consistency model, even if the latter
>>is better in theory).
>>
>>The reason: explicit barriers make it easy to punt the
>>problem entirely.
>>
>>If you have an implicit barrier, it's there all the
>>time, and the CPU microarchitecture needs to seriously
>>make sure that it works well. You cannot cop out and say
>>"memory barriers are expensive", because they are all over.
>>
>>In other words, there's an important psychological
>>reason why x86 does barriers so well: the CPU designers
>>were forced to (often against their will) to make sure they
>>worked better. The end result: x86 does locking pretty much
>>faster than any other architecture. Screw the whole "in
>>theory" part - this is a simple and fairly undeniable
>>fact.
>
>It's a fact that x86 does lock fast. But that does not mean that having implicit
>barriers is better than emplicit barriers. x86 does it fast because there is a
>huge economic imperative in that direction. x86 simply has more attention lavished on it than other architectures.
But he's not saying that it's better in any technical
sense.. he's saying it's "better" from the perspective
that the CPU designers can't cop out and make
barriers slow, so this benefits everyone.
And if you've ever talked to some of these HW guys,
you'd know they would like nothing better than to make
something they see as "rare" slower than molasses, even
though the frequency of its occurrance doesn't accurately
represent the true performance costs.
They only think in terms of how often do we see this thing:
So these stupid barrier guys, how often do they show
up in the traces?
Only once every few million instructions at very
most you say?
Well, hot-diggity damn, I can just allow this guy take
a few hundred cycles, no problem... after all it'd only be a
few percent more cycles, right? And my life just got a
hell of a lot easier..
I think this attitude of, "Oh it's only 1% boost to
put in tons of extra effort to make this "rare" thing
faster" is what he's arguing against, not any kind of
technical issue.
---------------------------
>Linus Torvalds (torvalds@osdl.org) on 10/26/06 wrote:
>---------------------------
>>Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/26/06 wrote:
>>>
>>>I think the conclusion of that discussion was that weak
>>>consistency ends up being worse than both total store
>>>ordering and release consistency, with release consistency
>>>most preferable among all.
>>
>>Largely, yes.
>
>For this specific application I'd agree.
>
>>>This issue here seems somewhat orthogonal - that it is
>>>preferable to have an implicit barrier between dependent
>>>reads. I wonder is the implicit barrier cheaper than an
>>>explicit one, and why ?
>>
>>There's a non-technical but very important reason why
>>implicit barriers are better than explicit ones (and one
>>reason why I actually think that in practice the
>>x86 memory ordering model would tend to be better than
>>even a full release consistency model, even if the latter
>>is better in theory).
>>
>>The reason: explicit barriers make it easy to punt the
>>problem entirely.
>>
>>If you have an implicit barrier, it's there all the
>>time, and the CPU microarchitecture needs to seriously
>>make sure that it works well. You cannot cop out and say
>>"memory barriers are expensive", because they are all over.
>>
>>In other words, there's an important psychological
>>reason why x86 does barriers so well: the CPU designers
>>were forced to (often against their will) to make sure they
>>worked better. The end result: x86 does locking pretty much
>>faster than any other architecture. Screw the whole "in
>>theory" part - this is a simple and fairly undeniable
>>fact.
>
>It's a fact that x86 does lock fast. But that does not mean that having implicit
>barriers is better than emplicit barriers. x86 does it fast because there is a
>huge economic imperative in that direction. x86 simply has more attention lavished on it than other architectures.
But he's not saying that it's better in any technical
sense.. he's saying it's "better" from the perspective
that the CPU designers can't cop out and make
barriers slow, so this benefits everyone.
And if you've ever talked to some of these HW guys,
you'd know they would like nothing better than to make
something they see as "rare" slower than molasses, even
though the frequency of its occurrance doesn't accurately
represent the true performance costs.
They only think in terms of how often do we see this thing:
So these stupid barrier guys, how often do they show
up in the traces?
Only once every few million instructions at very
most you say?
Well, hot-diggity damn, I can just allow this guy take
a few hundred cycles, no problem... after all it'd only be a
few percent more cycles, right? And my life just got a
hell of a lot easier..
I think this attitude of, "Oh it's only 1% boost to
put in tons of extra effort to make this "rare" thing
faster" is what he's arguing against, not any kind of
technical issue.