By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 13, 2015 11:05 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on July 12, 2015 7:52 pm wrote:
>
> Interestingly, Intel for a time was very hesitant to commit to their current memory consistency model
> by formalizing it in their ISA documents. IIRC this finally did happen around Core2(?) timeframe.
Yes. For the longest time the official intel memory ordering rules were that loads could be re-ordered wrt other loads. It may never have actually happened, but those were the official documented semantics. That one matters because lots of read-heavy lockless algorithms need the read barrier in the critical path.
The whole "stores are only delayed, they don't go ahead of earlier loads" was also not exactly well-documented. That one matters because it makes each store a release, which in turn means that unlocking a region can be done without extra synchronization. And this is where there were apparently some bugs in very early Pentium Pro cores.
> Presumably their cores were not implementing weaker consistency, but I'm sure there would have been questions
> over the issue of whether this stronger consistency would end up costing them significant performance.
I suspect that was a big issue, but also the whole worry about "have we actually really always followed the 'loads are ordered' rule?". From my understanding of talking to some core Intel engineers (Andy Glew), the x86 memory ordering rules were not really well documented even internally, and very few people really knew them. Everybody in the industry (and this includes people inside Intel), thought that the x86 memory ordering was very ad-hoc.
So I think that one reason they decided to document the rules better - and make them stricter - was simply that making these rules more explicit also really clarified what the rules were, and made it much easier for Intel to actually validate their memory subsystem.
Because validation really is very important when you do hardware. Intel (along with everybody else) has historically several times had the situation where they come out with a new CPU, and then in testing things do not work. Imagine the joy of a validation engineer when "Windows often blue-screens after 15-25 minutes of running winbench continuously".
Seriously, just imagine being a hardware developer and validation person, facing that kind of bug report. And I guarantee you that it has happened, many many times.
This is the kind of situation that most of the RISC vendors never even had. Look at Unix releases: they would generally follow hardware releases. Same goes with ARM in the embedded space. Compatibility simply wasn't as big of a deal, because 90% of the code that was run on a new CPU was basically validated for that CPU, rather than legacy code. In that kind of environment, under-specifying the behavior of your architecture isn't as big of an issue, and it gives the hardware designers much more flexibility.
Just look at how the weak memory subsystem people treat my "bugs happen" argument in just this thread. To them, that's not an issue - you should fix any bugs that show up in new microarchitectures. That's not how x86 has ever worked. If a new microarchitecture didn't work with old programs, it was the CPU that was considered buggy.
So for intel, the stronger rules really help in a very fundamental way. Yes, as a software developer, I prefer a more reliable platform. But for a hardware developer, the stricter rules also make it much more likely that they will eb able to validate a lot of subtle stuff ahead of time, without having to worry over-much about what crap code people throw at them.
And I believe that is a big deal. Validation is important. Being able to make big changes, and having a test-suite that not just tests "do we follow our weak-ass rules", but actually tests "do we have something that is very unlikely to break any external code" is a life-saver.
So yes, I do think that Intel thought hard about whether they could tighten their memory ordering documentations. But I also believe that one major issue was simply that tightening the rules was actually good not just for their customers, but for their own internal design flow. So they wanted to make sure to not paint themselves into a corner, but at the same time I suspect they actually wanted the straight-jacket of very strictly specified memory ordering.
And I think that's a sign of good engineering. Accepting that often stricter rules are actually a better thing, because it causes less surprises, and allows you to concentrate on the big issues because you think your rules protect you rather than hamper you.
Linus
>
> Interestingly, Intel for a time was very hesitant to commit to their current memory consistency model
> by formalizing it in their ISA documents. IIRC this finally did happen around Core2(?) timeframe.
Yes. For the longest time the official intel memory ordering rules were that loads could be re-ordered wrt other loads. It may never have actually happened, but those were the official documented semantics. That one matters because lots of read-heavy lockless algorithms need the read barrier in the critical path.
The whole "stores are only delayed, they don't go ahead of earlier loads" was also not exactly well-documented. That one matters because it makes each store a release, which in turn means that unlocking a region can be done without extra synchronization. And this is where there were apparently some bugs in very early Pentium Pro cores.
> Presumably their cores were not implementing weaker consistency, but I'm sure there would have been questions
> over the issue of whether this stronger consistency would end up costing them significant performance.
I suspect that was a big issue, but also the whole worry about "have we actually really always followed the 'loads are ordered' rule?". From my understanding of talking to some core Intel engineers (Andy Glew), the x86 memory ordering rules were not really well documented even internally, and very few people really knew them. Everybody in the industry (and this includes people inside Intel), thought that the x86 memory ordering was very ad-hoc.
So I think that one reason they decided to document the rules better - and make them stricter - was simply that making these rules more explicit also really clarified what the rules were, and made it much easier for Intel to actually validate their memory subsystem.
Because validation really is very important when you do hardware. Intel (along with everybody else) has historically several times had the situation where they come out with a new CPU, and then in testing things do not work. Imagine the joy of a validation engineer when "Windows often blue-screens after 15-25 minutes of running winbench continuously".
Seriously, just imagine being a hardware developer and validation person, facing that kind of bug report. And I guarantee you that it has happened, many many times.
This is the kind of situation that most of the RISC vendors never even had. Look at Unix releases: they would generally follow hardware releases. Same goes with ARM in the embedded space. Compatibility simply wasn't as big of a deal, because 90% of the code that was run on a new CPU was basically validated for that CPU, rather than legacy code. In that kind of environment, under-specifying the behavior of your architecture isn't as big of an issue, and it gives the hardware designers much more flexibility.
Just look at how the weak memory subsystem people treat my "bugs happen" argument in just this thread. To them, that's not an issue - you should fix any bugs that show up in new microarchitectures. That's not how x86 has ever worked. If a new microarchitecture didn't work with old programs, it was the CPU that was considered buggy.
So for intel, the stronger rules really help in a very fundamental way. Yes, as a software developer, I prefer a more reliable platform. But for a hardware developer, the stricter rules also make it much more likely that they will eb able to validate a lot of subtle stuff ahead of time, without having to worry over-much about what crap code people throw at them.
And I believe that is a big deal. Validation is important. Being able to make big changes, and having a test-suite that not just tests "do we follow our weak-ass rules", but actually tests "do we have something that is very unlikely to break any external code" is a life-saver.
So yes, I do think that Intel thought hard about whether they could tighten their memory ordering documentations. But I also believe that one major issue was simply that tightening the rules was actually good not just for their customers, but for their own internal design flow. So they wanted to make sure to not paint themselves into a corner, but at the same time I suspect they actually wanted the straight-jacket of very strictly specified memory ordering.
And I think that's a sign of good engineering. Accepting that often stricter rules are actually a better thing, because it causes less surprises, and allows you to concentrate on the big issues because you think your rules protect you rather than hamper you.
Linus