By: Rob Thorpe (rthorpe.delete@this.realworldtech.com), October 27, 2006 1:26 am
Room: Moderated Discussions
S. Rao (sonny@burdell.org) on 10/26/06 wrote:
---------------------------
>Rob Thorpe (rthorpe@realworldtech.com) on 10/26/06 wrote:
>---------------------------
>>Linus Torvalds (torvalds@osdl.org) on 10/26/06 wrote:
>>---------------------------
>>>Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/26/06 wrote:
>>>>
>>>>I think the conclusion of that discussion was that weak
>>>>consistency ends up being worse than both total store
>>>>ordering and release consistency, with release consistency
>>>>most preferable among all.
>>>
>>>Largely, yes.
>>
>>For this specific application I'd agree.
>>
>>>>This issue here seems somewhat orthogonal - that it is
>>>>preferable to have an implicit barrier between dependent
>>>>reads. I wonder is the implicit barrier cheaper than an
>>>>explicit one, and why ?
>>>
>>>There's a non-technical but very important reason why
>>>implicit barriers are better than explicit ones (and one
>>>reason why I actually think that in practice the
>>>x86 memory ordering model would tend to be better than
>>>even a full release consistency model, even if the latter
>>>is better in theory).
>>>
>>>The reason: explicit barriers make it easy to punt the
>>>problem entirely.
>>>
>>>If you have an implicit barrier, it's there all the
>>>time, and the CPU microarchitecture needs to seriously
>>>make sure that it works well. You cannot cop out and say
>>>"memory barriers are expensive", because they are all over.
>>>
>>>In other words, there's an important psychological
>>>reason why x86 does barriers so well: the CPU designers
>>>were forced to (often against their will) to make sure they
>>>worked better. The end result: x86 does locking pretty much
>>>faster than any other architecture. Screw the whole "in
>>>theory" part - this is a simple and fairly undeniable
>>>fact.
>>
>>It's a fact that x86 does lock fast. But that does not mean that having implicit
>>barriers is better than emplicit barriers. x86 does it fast because there is a
>>huge economic imperative in that direction. x86 simply has more attention lavished on it than other architectures.
>
>But he's not saying that it's better in any technical
>sense.. he's saying it's "better" from the perspective
>that the CPU designers can't cop out and make
>barriers slow, so this benefits everyone.
>
>And if you've ever talked to some of these HW guys,
>you'd know they would like nothing better than to make
>something they see as "rare" slower than molasses, even
>though the frequency of its occurrance doesn't accurately
>represent the true performance costs.
>
>They only think in terms of how often do we see this thing:
>
>So these stupid barrier guys, how often do they show
>up in the traces?
>
>Only once every few million instructions at very
>most you say?
>
>Well, hot-diggity damn, I can just allow this guy take
>a few hundred cycles, no problem... after all it'd only be a
>few percent more cycles, right? And my life just got a
>hell of a lot easier..
Well, however you look at it it is a technical issue.
I completely agree with the outlook of hardware guys: if it's rare then there's no reason not to let it be slow. Of-course this should change if usage changes, which is really what we're talking about here.
Everyone treats adding extra complexity as though it has no cost. What people have forgotten is that the more common tasks are made slower and microprocessors made hotter to service the needs of a few odd capabilities created long ago. This is not a good situation, though it is inevitable given history.
---------------------------
>Rob Thorpe (rthorpe@realworldtech.com) on 10/26/06 wrote:
>---------------------------
>>Linus Torvalds (torvalds@osdl.org) on 10/26/06 wrote:
>>---------------------------
>>>Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/26/06 wrote:
>>>>
>>>>I think the conclusion of that discussion was that weak
>>>>consistency ends up being worse than both total store
>>>>ordering and release consistency, with release consistency
>>>>most preferable among all.
>>>
>>>Largely, yes.
>>
>>For this specific application I'd agree.
>>
>>>>This issue here seems somewhat orthogonal - that it is
>>>>preferable to have an implicit barrier between dependent
>>>>reads. I wonder is the implicit barrier cheaper than an
>>>>explicit one, and why ?
>>>
>>>There's a non-technical but very important reason why
>>>implicit barriers are better than explicit ones (and one
>>>reason why I actually think that in practice the
>>>x86 memory ordering model would tend to be better than
>>>even a full release consistency model, even if the latter
>>>is better in theory).
>>>
>>>The reason: explicit barriers make it easy to punt the
>>>problem entirely.
>>>
>>>If you have an implicit barrier, it's there all the
>>>time, and the CPU microarchitecture needs to seriously
>>>make sure that it works well. You cannot cop out and say
>>>"memory barriers are expensive", because they are all over.
>>>
>>>In other words, there's an important psychological
>>>reason why x86 does barriers so well: the CPU designers
>>>were forced to (often against their will) to make sure they
>>>worked better. The end result: x86 does locking pretty much
>>>faster than any other architecture. Screw the whole "in
>>>theory" part - this is a simple and fairly undeniable
>>>fact.
>>
>>It's a fact that x86 does lock fast. But that does not mean that having implicit
>>barriers is better than emplicit barriers. x86 does it fast because there is a
>>huge economic imperative in that direction. x86 simply has more attention lavished on it than other architectures.
>
>But he's not saying that it's better in any technical
>sense.. he's saying it's "better" from the perspective
>that the CPU designers can't cop out and make
>barriers slow, so this benefits everyone.
>
>And if you've ever talked to some of these HW guys,
>you'd know they would like nothing better than to make
>something they see as "rare" slower than molasses, even
>though the frequency of its occurrance doesn't accurately
>represent the true performance costs.
>
>They only think in terms of how often do we see this thing:
>
>So these stupid barrier guys, how often do they show
>up in the traces?
>
>Only once every few million instructions at very
>most you say?
>
>Well, hot-diggity damn, I can just allow this guy take
>a few hundred cycles, no problem... after all it'd only be a
>few percent more cycles, right? And my life just got a
>hell of a lot easier..
Well, however you look at it it is a technical issue.
I completely agree with the outlook of hardware guys: if it's rare then there's no reason not to let it be slow. Of-course this should change if usage changes, which is really what we're talking about here.
Everyone treats adding extra complexity as though it has no cost. What people have forgotten is that the more common tasks are made slower and microprocessors made hotter to service the needs of a few odd capabilities created long ago. This is not a good situation, though it is inevitable given history.