SSE vs x87

Article: PhysX87: Software Deficiency
By: David Kanter (, July 7, 2010 1:55 pm
Room: Moderated Discussions
Joel Hruska ( on 7/7/10 wrote:
>I'm wondering if you had a chance to read over the whitepaper Intel recently published
>comparing CPU and GPU performance:
>Their work seems to agree with some of your findings, but >it's in relation to CUDA, not PhysX.
>My first question is whether or not you see your >investigation reinforcing Intel's
>own paper, or if they're two different topics and >computing areas.

I think they are related but different. I focused on why we see a performance gap between CPU and GPU in a specific case (e.g. PhysX).

I haven't read Intel's paper yet, but I believe their central point is that many of the CPU vs. GPU comparisons are not apples to apples (e.g. one is more optimized than the other, as is the case with PhysX). Moreover, I think they are arguing that if you do apples to apples comparisons, the performance gaps are not nearly as big.

I think my work is roughly aligned with some of what they are doing and seems congruent. Certainly I wouldn't consider PhysX an apples to apples comparison. But I don't have the ability to really examine what the performance gap for PhysX really is. Perhaps I can get to that sooner or later.

>My second question is this: In this article, you discuss x87 vs. SSE/SSE2, but
>don't mention SSE3, SSE4.1, or SSE4.2. It was always my >understanding that SSE2
>was the most important of Intel's SIMDs, but I never knew >if that was because it
>came out at the same time as the P4, and had to be used >for any kind of decent performance.

SSE1 was mostly for 4x32b (single precision), which was great for some things. But if you wanted 64 bits (double precision), you were stuck. And x87 was 80 bit precision.

So SSE1 couldn't totally replace x87. However, SSE2 makes x87 totally obsolete since it does 2x64b. That's why it's a big deal.

SSE3 and onwards were really more of refinements that added some missing instructions to SSE1 and SSE2 and some special purpose ones as well (e.g. string instructions in SSE4.2). But SSE1+SSE2 are really the key.

>How important are SSE3/4.1/4.2, and will AVX actually >improve SSE2 performance
>by handling instructions more efficiently?

AVX extends the vector length from 128 bits to 256-bits. So in theory, you could do 8x32b or 4x64b operations per cycle; basically double what is possible with SSE. In practice, the improvement probably isn't 2X and is probably closer to 1.5X...but that's due to the hardware that will run AVX, and not really a function of the instructions.

Comparing x87 to AVX, the improvement is probably something like 2-4X.

>Thanks much.

Sure thing. You should try and link up the original piece at HH : )

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
A bit off baseJohn Mann2010/07/07 06:04 AM
  A bit off baseDavid Kanter2010/07/07 10:28 AM
    SSE vs x87Joel Hruska2010/07/07 11:53 AM
      SSE vs x87Michael S2010/07/07 12:07 PM
        SSE vs x87hobold2010/07/08 04:12 AM
      SSE vs x87David Kanter2010/07/07 01:55 PM
        SSE vs x87Andi Kleen2010/07/08 01:43 AM
          80 bit FPRicardo B2010/07/08 06:35 AM
            80 bit FPDavid Kanter2010/07/08 10:14 AM
              80 bit FPKevin G2010/07/08 01:12 PM
                80 bit FPIan Ollmann2010/07/18 11:49 PM
                  80 bit FPDavid Kanter2010/07/19 10:33 AM
                    80 bit FPAnil Maliyekkel2010/07/19 03:49 PM
                      80 bit FPrwessel2010/07/19 04:41 PM
                    80 bit FPMatt Waldhauer2010/07/21 10:11 AM
            80 bit FPEmil Briggs2010/07/22 08:06 AM
    A bit off baseJohn Mann2010/07/08 10:06 AM
      A bit off baseDavid Kanter2010/07/08 10:27 AM
        A bit off baseIan Ameline2010/07/09 09:10 AM
          A bit off baseMichael S2010/07/10 01:13 PM
            A bit off baseIan Ameline2010/07/11 06:51 AM
  A bit off baseDavid Kanter2010/07/07 08:46 PM
    A bit off baseAnon2010/07/07 11:47 PM
      A bit off baseanon2010/07/08 01:15 AM
        A bit off baseGabriele Svelto2010/07/08 03:11 AM
          Physics engine historyPeter Clare2010/07/08 03:49 AM
            Physics engine historyNull Pointer Exception2010/07/08 05:07 AM
              Physics engine historyRalf2010/07/08 02:09 PM
                Physics engine historyDavid Kanter2010/07/08 03:16 PM
                  Physics engine historysJ2010/07/08 10:36 PM
                    Physics engine historyGabriele Svelto2010/07/08 11:59 PM
                      Physics engine historysJ2010/07/13 05:35 AM
                    Physics engine historyDavid Kanter2010/07/09 08:25 AM
                      Physics engine historysJ2010/07/13 05:49 AM
                      Physics engine historyfvdbergh2010/07/13 06:27 AM
    A bit off baseJohn Mann2010/07/08 10:11 AM
      A bit off baseDavid Kanter2010/07/08 10:31 AM
        150 GFLOP/s measured?anon2010/07/08 06:10 PM
          150 GFLOP/s measured?David Kanter2010/07/08 06:53 PM
            150 GFLOP/s measured?Aaron Spink2010/07/08 08:05 PM
              150 GFLOP/s measured?anon2010/07/08 08:31 PM
                150 GFLOP/s measured?Aaron Spink2010/07/08 09:43 PM
                  150 GFLOP/s measured?David Kanter2010/07/08 10:27 PM
                    150 GFLOP/s measured?Ian Ollmann2010/07/19 12:14 AM
                      150 GFLOP/s measured?anon2010/07/19 05:39 AM
                        150 GFLOP/s measured?hobold2010/07/19 06:26 AM
                          Philosophy for achieving peakDavid Kanter2010/07/19 10:49 AM
                      150 GFLOP/s measured?Linus Torvalds2010/07/19 06:36 AM
                        150 GFLOP/s measured?Richard Cownie2010/07/19 07:42 AM
                          150 GFLOP/s measured?Aaron Spink2010/07/19 07:56 AM
                            150 GFLOP/s measured?hobold2010/07/19 08:30 AM
                              150 GFLOP/s measured?Groo2010/07/19 01:31 PM
                                150 GFLOP/s measured?hobold2010/07/19 03:17 PM
                                  150 GFLOP/s measured?Groo2010/07/19 05:18 PM
                              150 GFLOP/s measured?Anon2010/07/19 05:18 PM
                            150 GFLOP/s measured?Mark Roulo2010/07/19 10:47 AM
                              150 GFLOP/s measured?slacker2010/07/19 11:55 AM
                                150 GFLOP/s measured?Mark Roulo2010/07/19 12:00 PM
                              150 GFLOP/s measured?anonymous422010/07/25 11:31 AM
                            150 GFLOP/s measured?Richard Cownie2010/07/19 11:41 AM
                              150 GFLOP/s measured?Linus Torvalds2010/07/19 01:57 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 03:10 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 03:10 PM
                                  150 GFLOP/s measured?hobold2010/07/19 03:25 PM
                                  150 GFLOP/s measured?Linus Torvalds2010/07/19 03:31 PM
                                    150 GFLOP/s measured?Richard Cownie2010/07/20 05:04 AM
                                150 GFLOP/s measured?jrl2010/07/20 12:18 AM
                            150 GFLOP/s measured?anonymous422010/07/25 11:00 AM
                              150 GFLOP/s measured?David Kanter2010/07/25 11:52 AM
                          150 GFLOP/s measured?Anon2010/07/19 05:15 PM
                            150 GFLOP/s measured?Linus Torvalds2010/07/19 06:27 PM
                              150 GFLOP/s measured?Anon2010/07/19 08:54 PM
                                150 GFLOP/s measured?anon2010/07/19 10:45 PM
                        150 GFLOP/s measured?hobold2010/07/19 08:14 AM
                          150 GFLOP/s measured?Linus Torvalds2010/07/19 10:56 AM
                            150 GFLOP/s measured?a reader2010/07/21 07:16 PM
                              150 GFLOP/s measured?Linus Torvalds2010/07/21 08:05 PM
                                150 GFLOP/s measured?anon2010/07/22 01:09 AM
                                  150 GFLOP/s measured?a reader2010/07/22 06:53 PM
                                    150 GFLOP/s measured?gallier22010/07/23 04:58 AM
                                      150 GFLOP/s measured?a reader2010/07/25 07:35 AM
                                        150 GFLOP/s measured?David Kanter2010/07/25 10:49 AM
                                          150 GFLOP/s measured?a reader2010/07/26 06:03 PM
                                            150 GFLOP/s measured?Michael S2010/07/28 12:38 AM
                                              150 GFLOP/s measured?Gabriele Svelto2010/07/28 12:44 AM
                                    150 GFLOP/s measured?anon2010/07/23 03:55 PM
                                      150 GFLOP/s measured?slacker2010/07/23 11:48 PM
                                        150 GFLOP/s measured?anon2010/07/24 01:36 AM
                                    150 GFLOP/s measured?Vincent Diepeveen2010/07/27 04:37 PM
                                      150 GFLOP/s measured??2010/07/27 10:42 PM
                                        150 GFLOP/s measured?slacker2010/07/28 04:55 AM
                                      Intel's clock rate projectionsAM2010/07/28 01:03 AM
                                        nostalgia ain't what it used to besomeone2010/07/28 04:38 AM
                                          Intel's clock rate projectionsAM2010/07/28 09:12 PM
                        Separate the OoO-ness from speculative-ness?2010/07/20 06:19 AM
                          Separate the OoO-ness from speculative-nessMark Christiansen2010/07/20 01:26 PM
                          Separate the OoO-ness from speculative-nessslacker2010/07/20 05:04 PM
                            Separate the OoO-ness from speculative-nessMatt Sayler2010/07/20 05:10 PM
                              Separate the OoO-ness from speculative-nessslacker2010/07/20 08:37 PM
                                Separate the OoO-ness from speculative-ness?2010/07/20 10:51 PM
                                  Separate the OoO-ness from speculative-nessanon2010/07/21 01:16 AM
                                    Separate the OoO-ness from speculative-ness?2010/07/21 06:05 AM
                                      Software conventionsPaul A. Clayton2010/07/21 07:52 AM
                                        Software conventions?2010/07/22 04:43 AM
                                      SpeculationDavid Kanter2010/07/21 09:32 AM
                                        Pipelining affects the ISA?2010/07/22 09:58 PM
                                          Pipelining affects the ISA?2010/07/22 10:14 PM
                                          Pipelining affects the ISArwessel2010/07/22 11:03 PM
                                            Pipelining affects the ISA?2010/07/23 04:50 AM
                                            Pipelining affects the ISA?2010/07/23 05:10 AM
                                              Pipelining affects the ISAThiago Kurovski2010/07/23 01:59 PM
                                                Pipelining affects the ISAanon2010/07/24 06:35 AM
                                                  Pipelining affects the ISAThiago Kurovski2010/07/24 10:12 AM
                                          Pipelining affects the ISAGabriele Svelto2010/07/26 01:50 AM
                                            Pipelining affects the ISAIlleglWpns2010/07/26 04:14 AM
                                              Pipelining affects the ISAMichael S2010/07/26 02:33 PM
                                      Separate the OoO-ness from speculative-nessanon2010/07/21 04:53 PM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 03:15 AM
                                          Separate the OoO-ness from speculative-nessanon2010/07/22 03:27 AM
                                      Separate the OoO-ness from speculative-nessslacker2010/07/21 06:45 PM
                                        Separate the OoO-ness from speculative-nessanon2010/07/22 12:57 AM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 04:26 AM
                                          Separate the OoO-ness from speculative-nessDan Downs2010/07/22 07:14 AM
                                          Confusing and not very useful definitionDavid Kanter2010/07/22 11:41 AM
                                            Confusing and not very useful definition?2010/07/22 09:58 PM
                                              Confusing and not very useful definitionUngo2010/07/24 11:06 AM
                                                Confusing and not very useful definition?2010/07/25 09:23 PM
                            Separate the OoO-ness from speculative-nesssomeone2010/07/20 07:02 PM
                              Separate the OoO-ness from speculative-nessThiago Kurovski2010/07/21 03:13 PM
            You are just quoting SINGLE precision flops? OMG what planet do you live? Vincent Diepeveen2010/07/19 09:26 AM
              The prior poster was talking about SP (NT)David Kanter2010/07/19 10:34 AM
                All FFT's need double precisionVincent Diepeveen2010/07/19 01:02 PM
                  All FFT's need double precisionDavid Kanter2010/07/19 01:09 PM
                    All FFT's need double precisionVincent Diepeveen2010/07/19 03:06 PM
                  All FFT's need double precision - notMichael S2010/07/20 12:16 AM
                    All FFT's need double precision - notUngo2010/07/20 11:04 PM
                      All FFT's need double precision - notMichael S2010/07/21 01:35 PM
                      All FFT's need double precision - notEduardoS2010/07/21 01:52 PM
                        All FFT's need double precision - notAnon2010/07/21 04:23 PM
                          All FFT's need double precision - notRicardo B2010/07/26 06:46 AM
                        I'm on a boat!anon2010/07/22 10:42 AM
                        All FFT's need double precision - notVincent Diepeveen2010/07/24 10:39 PM
                          All FFT's need double precision - notslacker2010/07/25 02:27 AM
                            All FFT's need double precision - notRicardo B2010/07/26 06:40 AM
                          All FFT's need double precision - notEduardoS2010/07/25 07:37 AM
                            All FFT's need double precision - notMichael S2010/07/25 09:43 AM
                    All FFT's need double precision - notVincent Diepeveen2010/07/24 10:19 PM
      A bit off baseEduardoS2010/07/08 03:08 PM
        A bit off baseGroo2010/07/08 05:11 PM
          A bit off basejohn mann2010/07/08 05:58 PM
            All right...let's cool it...David Kanter2010/07/08 06:54 PM
    A bit off baseVincent Diepeveen2010/07/19 02:36 PM
Reply to this Topic
Body: No Text
How do you spell avocado?