By: Linus Torvalds (torvalds.delete@this.osdl.org), October 26, 2006 8:57 am
Room: Moderated Discussions
Tzvetan Mikov (tzvetanmi@yahoo.com) on 10/26/06 wrote:
>
>I think the conclusion of that discussion was that weak
>consistency ends up being worse than both total store
>ordering and release consistency, with release consistency
>most preferable among all.
Largely, yes.
>This issue here seems somewhat orthogonal - that it is
>preferable to have an implicit barrier between dependent
>reads. I wonder is the implicit barrier cheaper than an
>explicit one, and why ?
There's a non-technical but very important reason why
implicit barriers are better than explicit ones (and one
reason why I actually think that in practice the
x86 memory ordering model would tend to be better than
even a full release consistency model, even if the latter
is better in theory).
The reason: explicit barriers make it easy to punt the
problem entirely.
If you have an implicit barrier, it's there all the
time, and the CPU microarchitecture needs to seriously
make sure that it works well. You cannot cop out and say
"memory barriers are expensive", because they are all over.
In other words, there's an important psychological
reason why x86 does barriers so well: the CPU designers
were forced to (often against their will) to make sure they
worked better. The end result: x86 does locking pretty much
faster than any other architecture. Screw the whole "in
theory" part - this is a simple and fairly undeniable
fact.
In theory, theory matters. In practice, having technical
limitations that actually force you to tackle certain
problems head on will often help you.
Having hard rules and limitations sounds like a problem
only when you want to make the "theoretically best machine".
In practice, they give you a framework to work within, and
force you to concentrate on actual implementation, which
in the end is a hell of a lot more important than all the
theoretical "best possible ISA" arguments that have ever
been argued.
This is my "Just Do It!" argument. The reason x86 does
well is because people just had to bite the bullet and do
it, rather than be involved with all the self-masturbatory
"let's try to come up with a instruction set and a compiler
that can take performace to the next level" crap that so
many other projects have done.
This is why I like "three decades of cruft". A lot of people
think that cruft is bad. It's not true. Cruft is a necessary
evil that goes along with "real life experience", and the
cruft part is much smaller than the experience and "we just
have to be compatible" part.
And btw, it's not just hardware. The reason I did Linux
and cared about things like POSIX and BSD and SysV and
general UNIX compatibility was exactly that I believe that
this is true in technology in general. I have seen too many
academic OS projects (and other software projects) that
were totally destroyed by the fact that they meant
to do things "from scratch" and "finally do things the
right way".
So there are huge advantages with "implicit
barriers". Screw theory - implicit barriers force the
microarchtiecture to just make barriers fast, because
there is no choice to punt the thing and do them wrong.
Linus
>
>I think the conclusion of that discussion was that weak
>consistency ends up being worse than both total store
>ordering and release consistency, with release consistency
>most preferable among all.
Largely, yes.
>This issue here seems somewhat orthogonal - that it is
>preferable to have an implicit barrier between dependent
>reads. I wonder is the implicit barrier cheaper than an
>explicit one, and why ?
There's a non-technical but very important reason why
implicit barriers are better than explicit ones (and one
reason why I actually think that in practice the
x86 memory ordering model would tend to be better than
even a full release consistency model, even if the latter
is better in theory).
The reason: explicit barriers make it easy to punt the
problem entirely.
If you have an implicit barrier, it's there all the
time, and the CPU microarchitecture needs to seriously
make sure that it works well. You cannot cop out and say
"memory barriers are expensive", because they are all over.
In other words, there's an important psychological
reason why x86 does barriers so well: the CPU designers
were forced to (often against their will) to make sure they
worked better. The end result: x86 does locking pretty much
faster than any other architecture. Screw the whole "in
theory" part - this is a simple and fairly undeniable
fact.
In theory, theory matters. In practice, having technical
limitations that actually force you to tackle certain
problems head on will often help you.
Having hard rules and limitations sounds like a problem
only when you want to make the "theoretically best machine".
In practice, they give you a framework to work within, and
force you to concentrate on actual implementation, which
in the end is a hell of a lot more important than all the
theoretical "best possible ISA" arguments that have ever
been argued.
This is my "Just Do It!" argument. The reason x86 does
well is because people just had to bite the bullet and do
it, rather than be involved with all the self-masturbatory
"let's try to come up with a instruction set and a compiler
that can take performace to the next level" crap that so
many other projects have done.
This is why I like "three decades of cruft". A lot of people
think that cruft is bad. It's not true. Cruft is a necessary
evil that goes along with "real life experience", and the
cruft part is much smaller than the experience and "we just
have to be compatible" part.
And btw, it's not just hardware. The reason I did Linux
and cared about things like POSIX and BSD and SysV and
general UNIX compatibility was exactly that I believe that
this is true in technology in general. I have seen too many
academic OS projects (and other software projects) that
were totally destroyed by the fact that they meant
to do things "from scratch" and "finally do things the
right way".
So there are huge advantages with "implicit
barriers". Screw theory - implicit barriers force the
microarchtiecture to just make barriers fast, because
there is no choice to punt the thing and do them wrong.
Linus