By: rwessel (robertwessel.delete@this.yahoo.com), October 27, 2006 3:21 pm
Room: Moderated Discussions
Linus Torvalds (torvalds@osdl.org) on 10/27/06 wrote:
---------------------------
>Rob Thorpe (rthorpe@realworldtech.com) on 10/27/06 wrote:
>>
>>You can't not have special cases. Everything a
>>microprocessor does is a special case tailored to some
>>set of particular applications.
>
>My point, which you seem to totally miss, over and over
>again, is that certain design approaches makes this worse.
>
>For example, there is the approach that x86 chose (never
>mind the reasons):
>
>- memory ordering is totally an internal issue, and
>memory barriers aren't needed, because we will just
>make the uarch do the rigt thing
>
>is fundamentally more robust than the approach
>most RISC vendors took.
>
>In other words, while the two are "equally good" in theory
>(and, in fact, release consistency might be better in
>theory), in practice it is better to just
>not even aim for the theory.
>
>Why is that so hard for people to understand? You should
>avoid special cases in the first place, not optimize for
>them.
>
>So the fact is, you very much can "not have special
>cases" in most settings. Yes, you'll inevitably need them
>for something, but if you can avoid them, you're a
>lot better off.
>
>The x86 memory ordering is one such thing. It didn't have
>memory barriers, and instead of adding them, they just
>accepted the fact that they aren't needed, if you just
>see the instruction stream as largely linear, and hide the
>fact that the actual execution may not be from all other
>levels..
>
>See?
>
>The extreme way of not doing this right is to
>expose all the quirks very publicly, and make every single
>special case be an architectural feature, visible to all.
>So you'd have register renaming done in an architecturally
>visible manner, you'd have memory ordering done visibly,
>you'd have IEEE special cases done by software.
>
>That's "obviously better", because it allows the hardware
>engineers to punt the hard stuff, right?
>
>WRONG.
>
>And that's what I'm saying. Certain mental models
>are just wrong. The "let special cases be handled by others"
>is such a mental model. It may work, but it results
>in crap in the end.
I largely agree with you on the memory ordering issue. There are just too many cases where the relaxed models are a royal PITA if the explicit barriers are expensive (expensive == too costly to put before 1/3 of all loads and stores). But I do think that it would be useful to offer both fast (relatively) strongly ordered loads and store *and* some even faster relaxed order loads and stores (which would, hopefully, imply that the explicit barriers would be reasonably fast too).
Of course convincing a compiler to make use of both is going to be a bit of a trick.
As a general concept, performance surprises are a bad idea. But I don't think that all cases are reasonably avoidable. We've discussed unaligned loads and stores before. IMO, all architectures should provide unaligned loads and store (even if several times slower than an aligned memory access), up to the point where the access crosses a page boundary. At that point you end up adding a huge amount of complexity for a really quite rare case. Can it be done? Obviously. Would having page-crossing aligned accesses handled in hardware improve my life as a programmer? Sure. But it's something I could easily live without (unlike, say, a total lack of “real” unaligned memory access support, which is a huge PITA).
I guess what I'm saying is that I can live with rare slow cases, so long as they're really rare. ;-)
---------------------------
>Rob Thorpe (rthorpe@realworldtech.com) on 10/27/06 wrote:
>>
>>You can't not have special cases. Everything a
>>microprocessor does is a special case tailored to some
>>set of particular applications.
>
>My point, which you seem to totally miss, over and over
>again, is that certain design approaches makes this worse.
>
>For example, there is the approach that x86 chose (never
>mind the reasons):
>
>- memory ordering is totally an internal issue, and
>memory barriers aren't needed, because we will just
>make the uarch do the rigt thing
>
>is fundamentally more robust than the approach
>most RISC vendors took.
>
>In other words, while the two are "equally good" in theory
>(and, in fact, release consistency might be better in
>theory), in practice it is better to just
>not even aim for the theory.
>
>Why is that so hard for people to understand? You should
>avoid special cases in the first place, not optimize for
>them.
>
>So the fact is, you very much can "not have special
>cases" in most settings. Yes, you'll inevitably need them
>for something, but if you can avoid them, you're a
>lot better off.
>
>The x86 memory ordering is one such thing. It didn't have
>memory barriers, and instead of adding them, they just
>accepted the fact that they aren't needed, if you just
>see the instruction stream as largely linear, and hide the
>fact that the actual execution may not be from all other
>levels..
>
>See?
>
>The extreme way of not doing this right is to
>expose all the quirks very publicly, and make every single
>special case be an architectural feature, visible to all.
>So you'd have register renaming done in an architecturally
>visible manner, you'd have memory ordering done visibly,
>you'd have IEEE special cases done by software.
>
>That's "obviously better", because it allows the hardware
>engineers to punt the hard stuff, right?
>
>WRONG.
>
>And that's what I'm saying. Certain mental models
>are just wrong. The "let special cases be handled by others"
>is such a mental model. It may work, but it results
>in crap in the end.
I largely agree with you on the memory ordering issue. There are just too many cases where the relaxed models are a royal PITA if the explicit barriers are expensive (expensive == too costly to put before 1/3 of all loads and stores). But I do think that it would be useful to offer both fast (relatively) strongly ordered loads and store *and* some even faster relaxed order loads and stores (which would, hopefully, imply that the explicit barriers would be reasonably fast too).
Of course convincing a compiler to make use of both is going to be a bit of a trick.
As a general concept, performance surprises are a bad idea. But I don't think that all cases are reasonably avoidable. We've discussed unaligned loads and stores before. IMO, all architectures should provide unaligned loads and store (even if several times slower than an aligned memory access), up to the point where the access crosses a page boundary. At that point you end up adding a huge amount of complexity for a really quite rare case. Can it be done? Obviously. Would having page-crossing aligned accesses handled in hardware improve my life as a programmer? Sure. But it's something I could easily live without (unlike, say, a total lack of “real” unaligned memory access support, which is a huge PITA).
I guess what I'm saying is that I can live with rare slow cases, so long as they're really rare. ;-)