Separate the OoO-ness from speculative-ness

Article: PhysX87: Software Deficiency
By: ? (, July 20, 2010 6:19 am
Room: Moderated Discussions
(To Linus: For gods sake, could you stop using those vulgarisms! Or are you inserting them into your texts on purpose to make your arguments appear stronger? I don't think it's working.)

It seems to me you two don't quite know what you are talking about. My point is that it is *not* about in-order vs. out-of-order, but rather about what OoO features does the CPU expose in the instruction set and in the documentation.

A non-speculative OoO CPU will always beat an in-order CPU (assuming the CPUs are otherwise identical (same number of execution units, same cache size, same memory bandwidth, etc)). NOTE that this theoretical OoO CPU I am mentioning here is *not* doing any *speculation* (except maybe those which also present in the in-order CPU). Once you have speculation, there by necessity exist some cases in which the CPU's prediction engine will make a wrong decision.

Even mere pipelining entails a small dose of speculative execution !!!

I will repeat it one more time, because it seems to me that you are completely missing this: Even mere pipelining entails a small dose of speculation !!!

What Ian Ollmann is criticizing is *not* the out-of-order-ness of the CPU but rather its speculative-ness. When he wrote "...something with a simpler execution engine, typically an in order machine [...] is far easier to take to close to peak performance..." he is simply wrong. The truth actually is that a programmer in assembly language will be able to take OoO CPU closer to peak performance (measured in IPC) rather than an in-order CPU. The reason for this is quite simple, but you both are missing it: the reason is that a non-speculative OoO CPU is designed to exploit certain *runtime-only* information when making decisions about which instruction(s) can be executed. On the other hand, a non-speculative in-order CPU is incapable of taking the runtime-only information into account.

The point is that the asm programmer does *not* have the runtime-only info at his/her disposal - which is quite obvious since the programmer is not running the program, the CPU is. The OoO CPU has knowledge (=truths) concerning certain pieces of the code that the programmer does not have.

The bad and confusing thing is that, in the x86 world, speculation and OoO are so mixed up with each other that one has no idea when the CPU is doing only speculation and when only executing out-of-order. It's so sad - the informal language we are using to describe e.g. a Nehalem makes little distinction between those two things. But they *are* two completely different/separate/othogonal things.

L1/L2/Ln cache obviously falls within the speculative category. Of course, if the cache has multiple ports thus allowing e.g. two parallel reads per clock, then this parallelism can be used by an OoO engine (if it happens to know for a fact that it can do two independent reads). But aside from these parallelisms, which are fully optional, the core idea of a cache is the speculative-ness - in other words, it is impossible to design a cache which would not entail a speculative element.

On other hand, I am claiming here it is possible to design an OoO CPU which does not entail any speculative element whatsoever. More precisely, such a CPU would appear to never make any misprediction - in cases in which it does not know what to do, it simply waits for the results, it never speculates the results. Of course, this pure OoO CPU would be noticeably slower when executing "typical" codes (e.g: Firefox, gzip, etc) than the same CPU with speculation.

(See Linus, I didn't use any vulgarism to make my argument. So behave yourself.)

Linus Torvalds ( on 7/19/10 wrote:
>Ian Ollmann ( on 7/19/10 wrote:
>>I personally find the OoO engine to be a bit of a hinderance to performance. Its
>>great for the massively overwhelming amount of code out there written with a C/C++
>>compiler with not too much thought to performance. (read: 99.99%+ of code) There,
>>you might get 10% of peak instead of 3% of peak. However, when you are trying to
>>get the last drop out of the architecture and are willing to go to assembly, something
>>with a simpler execution engine, typically an in order machine with non-destructive
>>ISA, is far easier to take to close to peak performance.
>Umm. That may be true, because it's clearly easier to tune
>something that is simpler and more deterministic. However,
>you are making a silly argument for two reasons:
>- the kind of code you talk about is such a small fraction
>of the code any real CPU runs that it's not really worth
>even bringing up as an argument.
>- even for the kind of code you talk about, that elusive
>"peak performance" is likely higher with OoO!
>So your argument that you can get 'closer to peak' is kind
>of pointless, if 'closer to peak' is still slower on
>an in-order part.
>In fact, your argument is the exact same argument that
>people used to use against caches. The caches make it much
>harder to get "peak" performance, because suddenly the
>actual performance depends a lot on things like data
>structure layouts etc.
>So without caches, you can tune your algorithm at the
>assembly level, and you can pat yourself on the back and
>tell yourself "this cannot be done faster". With caches,
>that suddenly becomes much harder, because the fastest
>code sequence often depends on which parts were still in
>cache when the function was called etc etc.
>So cacheless CPU's can be tuned much better. But who the
>f*ck cares whether you can get to 80% of peak, or 100% of
>peak, when the "100% peak of non-cached" value is way way
>less than "80% peak of something with a cache".
>The exact same is true of OoO. Your "it's harder to
>tune" argument is as invalid today as it was 30 years ago,
>despite being totally true.
>The thing is, when you have non-uniform timing, OoO helps.
>And there are basically zero loads that don't have
>that. You have differently sized caches, you have memory
>subsystems that are not the same speed etc etc.
>That theoretical "peak" thing only exists on a single
>machine with a very particular setup. If you change any
>of the details, your optimized routine may no longer be
>optimal, because now the optimal scheduling is different.
>And that's where OoO helps. It helps with things like
>different L1 cache latencies in different implementations.
>It helps with having different L1 (or L2) sizes. It helps
>with slightly different branch predictors etc.
>Without OoO, you can never have something robust.
>You end up having to tune your code for one particular
>machine, and five years from now when that machine no
>longer exists, you're basically screwed.
>In other words, your argument is pure and utter garbage.
>It's "true" only in a trivial and totally idiotic way.
>Because if the code is so important that it is worth tuning
>for at an assembly level, it's so important that it will
>run on machines for longer than one generation or on many
>different versions of a similar machine.
>At which point you'll want OoO. Really. Because in-order
>will screw you up.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
A bit off baseJohn Mann2010/07/07 06:04 AM
  A bit off baseDavid Kanter2010/07/07 10:28 AM
    SSE vs x87Joel Hruska2010/07/07 11:53 AM
      SSE vs x87Michael S2010/07/07 12:07 PM
        SSE vs x87hobold2010/07/08 04:12 AM
      SSE vs x87David Kanter2010/07/07 01:55 PM
        SSE vs x87Andi Kleen2010/07/08 01:43 AM
          80 bit FPRicardo B2010/07/08 06:35 AM
            80 bit FPDavid Kanter2010/07/08 10:14 AM
              80 bit FPKevin G2010/07/08 01:12 PM
                80 bit FPIan Ollmann2010/07/18 11:49 PM
                  80 bit FPDavid Kanter2010/07/19 10:33 AM
                    80 bit FPAnil Maliyekkel2010/07/19 03:49 PM
                      80 bit FPrwessel2010/07/19 04:41 PM
                    80 bit FPMatt Waldhauer2010/07/21 10:11 AM
            80 bit FPEmil Briggs2010/07/22 08:06 AM
    A bit off baseJohn Mann2010/07/08 10:06 AM
      A bit off baseDavid Kanter2010/07/08 10:27 AM
        A bit off baseIan Ameline2010/07/09 09:10 AM
          A bit off baseMichael S2010/07/10 01:13 PM
            A bit off baseIan Ameline2010/07/11 06:51 AM
  A bit off baseDavid Kanter2010/07/07 08:46 PM
    A bit off baseAnon2010/07/07 11:47 PM
      A bit off baseanon2010/07/08 01:15 AM
        A bit off baseGabriele Svelto2010/07/08 03:11 AM
          Physics engine historyPeter Clare2010/07/08 03:49 AM
            Physics engine historyNull Pointer Exception2010/07/08 05:07 AM
              Physics engine historyRalf2010/07/08 02:09 PM
                Physics engine historyDavid Kanter2010/07/08 03:16 PM
                  Physics engine historysJ2010/07/08 10:36 PM
                    Physics engine historyGabriele Svelto2010/07/08 11:59 PM
                      Physics engine historysJ2010/07/13 05:35 AM
                    Physics engine historyDavid Kanter2010/07/09 08:25 AM
                      Physics engine historysJ2010/07/13 05:49 AM
                      Physics engine historyfvdbergh2010/07/13 06:27 AM
    A bit off baseJohn Mann2010/07/08 10:11 AM
      A bit off baseDavid Kanter2010/07/08 10:31 AM
        150 GFLOP/s measured?anon2010/07/08 06:10 PM
          150 GFLOP/s measured?David Kanter2010/07/08 06:53 PM
            150 GFLOP/s measured?Aaron Spink2010/07/08 08:05 PM
              150 GFLOP/s measured?anon2010/07/08 08:31 PM
                150 GFLOP/s measured?Aaron Spink2010/07/08 09:43 PM
                  150 GFLOP/s measured?David Kanter2010/07/08 10:27 PM
                    150 GFLOP/s measured?Ian Ollmann2010/07/19 12:14 AM
                      150 GFLOP/s measured?anon2010/07/19 05:39 AM
                        150 GFLOP/s measured?hobold2010/07/19 06:26 AM
                          Philosophy for achieving peakDavid Kanter2010/07/19 10:49 AM
                      150 GFLOP/s measured?Linus Torvalds2010/07/19 06:36 AM
                        150 GFLOP/s measured?Richard Cownie2010/07/19 07:42 AM
                          150 GFLOP/s measured?Aaron Spink2010/07/19 07:56 AM
                            150 GFLOP/s measured?hobold2010/07/19 08:30 AM
                              150 GFLOP/s measured?Groo2010/07/19 01:31 PM
                                150 GFLOP/s measured?hobold2010/07/19 03:17 PM
                                  150 GFLOP/s measured?Groo2010/07/19 05:18 PM
                              150 GFLOP/s measured?Anon2010/07/19 05:18 PM
                            150 GFLOP/s measured?Mark Roulo2010/07/19 10:47 AM
                              150 GFLOP/s measured?slacker2010/07/19 11:55 AM
                                150 GFLOP/s measured?Mark Roulo2010/07/19 12:00 PM
                              150 GFLOP/s measured?anonymous422010/07/25 11:31 AM
                            150 GFLOP/s measured?Richard Cownie2010/07/19 11:41 AM
                              150 GFLOP/s measured?Linus Torvalds2010/07/19 01:57 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 03:10 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 03:10 PM
                                  150 GFLOP/s measured?hobold2010/07/19 03:25 PM
                                  150 GFLOP/s measured?Linus Torvalds2010/07/19 03:31 PM
                                    150 GFLOP/s measured?Richard Cownie2010/07/20 05:04 AM
                                150 GFLOP/s measured?jrl2010/07/20 12:18 AM
                            150 GFLOP/s measured?anonymous422010/07/25 11:00 AM
                              150 GFLOP/s measured?David Kanter2010/07/25 11:52 AM
                          150 GFLOP/s measured?Anon2010/07/19 05:15 PM
                            150 GFLOP/s measured?Linus Torvalds2010/07/19 06:27 PM
                              150 GFLOP/s measured?Anon2010/07/19 08:54 PM
                                150 GFLOP/s measured?anon2010/07/19 10:45 PM
                        150 GFLOP/s measured?hobold2010/07/19 08:14 AM
                          150 GFLOP/s measured?Linus Torvalds2010/07/19 10:56 AM
                            150 GFLOP/s measured?a reader2010/07/21 07:16 PM
                              150 GFLOP/s measured?Linus Torvalds2010/07/21 08:05 PM
                                150 GFLOP/s measured?anon2010/07/22 01:09 AM
                                  150 GFLOP/s measured?a reader2010/07/22 06:53 PM
                                    150 GFLOP/s measured?gallier22010/07/23 04:58 AM
                                      150 GFLOP/s measured?a reader2010/07/25 07:35 AM
                                        150 GFLOP/s measured?David Kanter2010/07/25 10:49 AM
                                          150 GFLOP/s measured?a reader2010/07/26 06:03 PM
                                            150 GFLOP/s measured?Michael S2010/07/28 12:38 AM
                                              150 GFLOP/s measured?Gabriele Svelto2010/07/28 12:44 AM
                                    150 GFLOP/s measured?anon2010/07/23 03:55 PM
                                      150 GFLOP/s measured?slacker2010/07/23 11:48 PM
                                        150 GFLOP/s measured?anon2010/07/24 01:36 AM
                                    150 GFLOP/s measured?Vincent Diepeveen2010/07/27 04:37 PM
                                      150 GFLOP/s measured??2010/07/27 10:42 PM
                                        150 GFLOP/s measured?slacker2010/07/28 04:55 AM
                                      Intel's clock rate projectionsAM2010/07/28 01:03 AM
                                        nostalgia ain't what it used to besomeone2010/07/28 04:38 AM
                                          Intel's clock rate projectionsAM2010/07/28 09:12 PM
                        Separate the OoO-ness from speculative-ness?2010/07/20 06:19 AM
                          Separate the OoO-ness from speculative-nessMark Christiansen2010/07/20 01:26 PM
                          Separate the OoO-ness from speculative-nessslacker2010/07/20 05:04 PM
                            Separate the OoO-ness from speculative-nessMatt Sayler2010/07/20 05:10 PM
                              Separate the OoO-ness from speculative-nessslacker2010/07/20 08:37 PM
                                Separate the OoO-ness from speculative-ness?2010/07/20 10:51 PM
                                  Separate the OoO-ness from speculative-nessanon2010/07/21 01:16 AM
                                    Separate the OoO-ness from speculative-ness?2010/07/21 06:05 AM
                                      Software conventionsPaul A. Clayton2010/07/21 07:52 AM
                                        Software conventions?2010/07/22 04:43 AM
                                      SpeculationDavid Kanter2010/07/21 09:32 AM
                                        Pipelining affects the ISA?2010/07/22 09:58 PM
                                          Pipelining affects the ISA?2010/07/22 10:14 PM
                                          Pipelining affects the ISArwessel2010/07/22 11:03 PM
                                            Pipelining affects the ISA?2010/07/23 04:50 AM
                                            Pipelining affects the ISA?2010/07/23 05:10 AM
                                              Pipelining affects the ISAThiago Kurovski2010/07/23 01:59 PM
                                                Pipelining affects the ISAanon2010/07/24 06:35 AM
                                                  Pipelining affects the ISAThiago Kurovski2010/07/24 10:12 AM
                                          Pipelining affects the ISAGabriele Svelto2010/07/26 01:50 AM
                                            Pipelining affects the ISAIlleglWpns2010/07/26 04:14 AM
                                              Pipelining affects the ISAMichael S2010/07/26 02:33 PM
                                      Separate the OoO-ness from speculative-nessanon2010/07/21 04:53 PM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 03:15 AM
                                          Separate the OoO-ness from speculative-nessanon2010/07/22 03:27 AM
                                      Separate the OoO-ness from speculative-nessslacker2010/07/21 06:45 PM
                                        Separate the OoO-ness from speculative-nessanon2010/07/22 12:57 AM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 04:26 AM
                                          Separate the OoO-ness from speculative-nessDan Downs2010/07/22 07:14 AM
                                          Confusing and not very useful definitionDavid Kanter2010/07/22 11:41 AM
                                            Confusing and not very useful definition?2010/07/22 09:58 PM
                                              Confusing and not very useful definitionUngo2010/07/24 11:06 AM
                                                Confusing and not very useful definition?2010/07/25 09:23 PM
                            Separate the OoO-ness from speculative-nesssomeone2010/07/20 07:02 PM
                              Separate the OoO-ness from speculative-nessThiago Kurovski2010/07/21 03:13 PM
            You are just quoting SINGLE precision flops? OMG what planet do you live? Vincent Diepeveen2010/07/19 09:26 AM
              The prior poster was talking about SP (NT)David Kanter2010/07/19 10:34 AM
                All FFT's need double precisionVincent Diepeveen2010/07/19 01:02 PM
                  All FFT's need double precisionDavid Kanter2010/07/19 01:09 PM
                    All FFT's need double precisionVincent Diepeveen2010/07/19 03:06 PM
                  All FFT's need double precision - notMichael S2010/07/20 12:16 AM
                    All FFT's need double precision - notUngo2010/07/20 11:04 PM
                      All FFT's need double precision - notMichael S2010/07/21 01:35 PM
                      All FFT's need double precision - notEduardoS2010/07/21 01:52 PM
                        All FFT's need double precision - notAnon2010/07/21 04:23 PM
                          All FFT's need double precision - notRicardo B2010/07/26 06:46 AM
                        I'm on a boat!anon2010/07/22 10:42 AM
                        All FFT's need double precision - notVincent Diepeveen2010/07/24 10:39 PM
                          All FFT's need double precision - notslacker2010/07/25 02:27 AM
                            All FFT's need double precision - notRicardo B2010/07/26 06:40 AM
                          All FFT's need double precision - notEduardoS2010/07/25 07:37 AM
                            All FFT's need double precision - notMichael S2010/07/25 09:43 AM
                    All FFT's need double precision - notVincent Diepeveen2010/07/24 10:19 PM
      A bit off baseEduardoS2010/07/08 03:08 PM
        A bit off baseGroo2010/07/08 05:11 PM
          A bit off basejohn mann2010/07/08 05:58 PM
            All right...let's cool it...David Kanter2010/07/08 06:54 PM
    A bit off baseVincent Diepeveen2010/07/19 02:36 PM
Reply to this Topic
Body: No Text
How do you spell avocado?