150 GFLOP/s measured?

Article: PhysX87: Software Deficiency
By: Vincent Diepeveen (diep.delete@this.xs4all.nl), July 27, 2010 5:37 pm
Room: Moderated Discussions
a reader (a@b.c) on 7/22/10 wrote:
---------------------------
>anon (anon@anon.com) on 7/22/10 wrote:
>---------------------------
>>Linus Torvalds (torvalds@linux-foundation.org) on 7/21/10 wrote:
>>---------------------------
>>>a reader (a@b.c) on 7/21/10 wrote:
>>>>
>>>>odd things with store buffer replay? like what?
>>>>
>>>>are you sure it's not just L1D too small?
>>>
>
>>>I think there were a few other cases of nasty replays too,
>>>and they really end up depending very subtly about just
>>>which cycle the different micro-ops got scheduled in. When
>>>code works well, the P4 runs like a greased bat out of
>>>hell, and then some very non-obvious things can make the
>>>almost identical code take replay traps all the time and
>>>just come to a crawling halt.
>>
>>What made the P4 run so fast? Was it the high clock and ALU frequency?
>>
>>I wonder if we'll see a return of the trace cache to improve the apparent efficiency
>>and/or wideness of the x86 fetch and decoder? Preferably with the normal L1I still
>>intact. I guess the loop detector is basically that, and probably will expand to handle more cases.
>>
>
>p4 was designed for good micro-benchmark performance.
>gHz matter there.
>
>why do you want trace cache back? it worked in some
>academic cases. but it was probably a mistake to use
>it in production design.
>
>it's not easy to pin point why p4 perform so badly on real code.

In general the design sucked ass.

You want to do a shift-right instruction?

Boom 7 cycles penalty.

You wanted a CMOV type instruction (how ELSE to replace a branch?)?

Boom also BIG penalty.

All sorts of worst cases.

Just give me a chip with 512 pentium3's at 1Ghz at a cheap price.

Nearly no communication between the cores at all, or nearly not at all. No synchronisation between caches. Just embarrassingly parallel chip. *nearly* embarrassingly parallel.

Each P3 just its own memory space. Then i can run 128 tasks embarrassingly parallel.

Instead we got a P4 chip, wasting transistors.

High clocked, sure. And according to the announcement of intel at the time by 2010 we would see a 10Ghz P4.

Well so far history.

Let's bury it all :)

core2 is a winner design. Kicks butt.

All i want is a core2 chip with a multiplication unit for integers that can issue a multiplication every cycle, right now throughput cost is 3.75 cycles per multiplication.

64 x 64 unsigned int == 128 bits unsigned int split over 2 registers.

Alternative a SIMD instruction doing exactly this at a throughput of 1 a cycle.

AMD is kicking AMD silly here. Even oldie cpu's. The same thing at AMD has a cost of 2.25 cycles.

(source : Torbjorn Granlund, known to some from the GMP math library)

So this problem of intel that some instructions simply are real slow is not new and not old, it's of all times.

The chip has been optimized too much to testsuites if i may say so.

Regrettably GMP is not in specint...

>but Linus' explanation about the replays seems to be
>quite on the mark. it's possible that intel added some
>early rule-of-thumb checks on the replay path, which
>gave too many false positives, that would lead to excessive
>and unpredictable replays, hence the bad performance.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
A bit off baseJohn Mann2010/07/07 07:04 AM
  A bit off baseDavid Kanter2010/07/07 11:28 AM
    SSE vs x87Joel Hruska2010/07/07 12:53 PM
      SSE vs x87Michael S2010/07/07 01:07 PM
        SSE vs x87hobold2010/07/08 05:12 AM
      SSE vs x87David Kanter2010/07/07 02:55 PM
        SSE vs x87Andi Kleen2010/07/08 02:43 AM
          80 bit FPRicardo B2010/07/08 07:35 AM
            80 bit FPDavid Kanter2010/07/08 11:14 AM
              80 bit FPKevin G2010/07/08 02:12 PM
                80 bit FPIan Ollmann2010/07/19 12:49 AM
                  80 bit FPDavid Kanter2010/07/19 11:33 AM
                    80 bit FPAnil Maliyekkel2010/07/19 04:49 PM
                      80 bit FPrwessel2010/07/19 05:41 PM
                    80 bit FPMatt Waldhauer2010/07/21 11:11 AM
            80 bit FPEmil Briggs2010/07/22 09:06 AM
    A bit off baseJohn Mann2010/07/08 11:06 AM
      A bit off baseDavid Kanter2010/07/08 11:27 AM
        A bit off baseIan Ameline2010/07/09 10:10 AM
          A bit off baseMichael S2010/07/10 02:13 PM
            A bit off baseIan Ameline2010/07/11 07:51 AM
  A bit off baseDavid Kanter2010/07/07 09:46 PM
    A bit off baseAnon2010/07/08 12:47 AM
      A bit off baseanon2010/07/08 02:15 AM
        A bit off baseGabriele Svelto2010/07/08 04:11 AM
          Physics engine historyPeter Clare2010/07/08 04:49 AM
            Physics engine historyNull Pointer Exception2010/07/08 06:07 AM
              Physics engine historyRalf2010/07/08 03:09 PM
                Physics engine historyDavid Kanter2010/07/08 04:16 PM
                  Physics engine historysJ2010/07/08 11:36 PM
                    Physics engine historyGabriele Svelto2010/07/09 12:59 AM
                      Physics engine historysJ2010/07/13 06:35 AM
                    Physics engine historyDavid Kanter2010/07/09 09:25 AM
                      Physics engine historysJ2010/07/13 06:49 AM
                      Physics engine historyfvdbergh2010/07/13 07:27 AM
    A bit off baseJohn Mann2010/07/08 11:11 AM
      A bit off baseDavid Kanter2010/07/08 11:31 AM
        150 GFLOP/s measured?anon2010/07/08 07:10 PM
          150 GFLOP/s measured?David Kanter2010/07/08 07:53 PM
            150 GFLOP/s measured?Aaron Spink2010/07/08 09:05 PM
              150 GFLOP/s measured?anon2010/07/08 09:31 PM
                150 GFLOP/s measured?Aaron Spink2010/07/08 10:43 PM
                  150 GFLOP/s measured?David Kanter2010/07/08 11:27 PM
                    150 GFLOP/s measured?Ian Ollmann2010/07/19 01:14 AM
                      150 GFLOP/s measured?anon2010/07/19 06:39 AM
                        150 GFLOP/s measured?hobold2010/07/19 07:26 AM
                          Philosophy for achieving peakDavid Kanter2010/07/19 11:49 AM
                      150 GFLOP/s measured?Linus Torvalds2010/07/19 07:36 AM
                        150 GFLOP/s measured?Richard Cownie2010/07/19 08:42 AM
                          150 GFLOP/s measured?Aaron Spink2010/07/19 08:56 AM
                            150 GFLOP/s measured?hobold2010/07/19 09:30 AM
                              150 GFLOP/s measured?Groo2010/07/19 02:31 PM
                                150 GFLOP/s measured?hobold2010/07/19 04:17 PM
                                  150 GFLOP/s measured?Groo2010/07/19 06:18 PM
                              150 GFLOP/s measured?Anon2010/07/19 06:18 PM
                            150 GFLOP/s measured?Mark Roulo2010/07/19 11:47 AM
                              150 GFLOP/s measured?slacker2010/07/19 12:55 PM
                                150 GFLOP/s measured?Mark Roulo2010/07/19 01:00 PM
                              150 GFLOP/s measured?anonymous422010/07/25 12:31 PM
                            150 GFLOP/s measured?Richard Cownie2010/07/19 12:41 PM
                              150 GFLOP/s measured?Linus Torvalds2010/07/19 02:57 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 04:10 PM
                                150 GFLOP/s measured?Richard Cownie2010/07/19 04:10 PM
                                  150 GFLOP/s measured?hobold2010/07/19 04:25 PM
                                  150 GFLOP/s measured?Linus Torvalds2010/07/19 04:31 PM
                                    150 GFLOP/s measured?Richard Cownie2010/07/20 06:04 AM
                                150 GFLOP/s measured?jrl2010/07/20 01:18 AM
                            150 GFLOP/s measured?anonymous422010/07/25 12:00 PM
                              150 GFLOP/s measured?David Kanter2010/07/25 12:52 PM
                          150 GFLOP/s measured?Anon2010/07/19 06:15 PM
                            150 GFLOP/s measured?Linus Torvalds2010/07/19 07:27 PM
                              150 GFLOP/s measured?Anon2010/07/19 09:54 PM
                                150 GFLOP/s measured?anon2010/07/19 11:45 PM
                        150 GFLOP/s measured?hobold2010/07/19 09:14 AM
                          150 GFLOP/s measured?Linus Torvalds2010/07/19 11:56 AM
                            150 GFLOP/s measured?a reader2010/07/21 08:16 PM
                              150 GFLOP/s measured?Linus Torvalds2010/07/21 09:05 PM
                                150 GFLOP/s measured?anon2010/07/22 02:09 AM
                                  150 GFLOP/s measured?a reader2010/07/22 07:53 PM
                                    150 GFLOP/s measured?gallier22010/07/23 05:58 AM
                                      150 GFLOP/s measured?a reader2010/07/25 08:35 AM
                                        150 GFLOP/s measured?David Kanter2010/07/25 11:49 AM
                                          150 GFLOP/s measured?a reader2010/07/26 07:03 PM
                                            150 GFLOP/s measured?Michael S2010/07/28 01:38 AM
                                              150 GFLOP/s measured?Gabriele Svelto2010/07/28 01:44 AM
                                    150 GFLOP/s measured?anon2010/07/23 04:55 PM
                                      150 GFLOP/s measured?slacker2010/07/24 12:48 AM
                                        150 GFLOP/s measured?anon2010/07/24 02:36 AM
                                    150 GFLOP/s measured?Vincent Diepeveen2010/07/27 05:37 PM
                                      150 GFLOP/s measured??2010/07/27 11:42 PM
                                        150 GFLOP/s measured?slacker2010/07/28 05:55 AM
                                      Intel's clock rate projectionsAM2010/07/28 02:03 AM
                                        nostalgia ain't what it used to besomeone2010/07/28 05:38 AM
                                          Intel's clock rate projectionsAM2010/07/28 10:12 PM
                        Separate the OoO-ness from speculative-ness?2010/07/20 07:19 AM
                          Separate the OoO-ness from speculative-nessMark Christiansen2010/07/20 02:26 PM
                          Separate the OoO-ness from speculative-nessslacker2010/07/20 06:04 PM
                            Separate the OoO-ness from speculative-nessMatt Sayler2010/07/20 06:10 PM
                              Separate the OoO-ness from speculative-nessslacker2010/07/20 09:37 PM
                                Separate the OoO-ness from speculative-ness?2010/07/20 11:51 PM
                                  Separate the OoO-ness from speculative-nessanon2010/07/21 02:16 AM
                                    Separate the OoO-ness from speculative-ness?2010/07/21 07:05 AM
                                      Software conventionsPaul A. Clayton2010/07/21 08:52 AM
                                        Software conventions?2010/07/22 05:43 AM
                                      SpeculationDavid Kanter2010/07/21 10:32 AM
                                        Pipelining affects the ISA?2010/07/22 10:58 PM
                                          Pipelining affects the ISA?2010/07/22 11:14 PM
                                          Pipelining affects the ISArwessel2010/07/23 12:03 AM
                                            Pipelining affects the ISA?2010/07/23 05:50 AM
                                            Pipelining affects the ISA?2010/07/23 06:10 AM
                                              Pipelining affects the ISAThiago Kurovski2010/07/23 02:59 PM
                                                Pipelining affects the ISAanon2010/07/24 07:35 AM
                                                  Pipelining affects the ISAThiago Kurovski2010/07/24 11:12 AM
                                          Pipelining affects the ISAGabriele Svelto2010/07/26 02:50 AM
                                            Pipelining affects the ISAIlleglWpns2010/07/26 05:14 AM
                                              Pipelining affects the ISAMichael S2010/07/26 03:33 PM
                                      Separate the OoO-ness from speculative-nessanon2010/07/21 05:53 PM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 04:15 AM
                                          Separate the OoO-ness from speculative-nessanon2010/07/22 04:27 AM
                                      Separate the OoO-ness from speculative-nessslacker2010/07/21 07:45 PM
                                        Separate the OoO-ness from speculative-nessanon2010/07/22 01:57 AM
                                        Separate the OoO-ness from speculative-ness?2010/07/22 05:26 AM
                                          Separate the OoO-ness from speculative-nessDan Downs2010/07/22 08:14 AM
                                          Confusing and not very useful definitionDavid Kanter2010/07/22 12:41 PM
                                            Confusing and not very useful definition?2010/07/22 10:58 PM
                                              Confusing and not very useful definitionUngo2010/07/24 12:06 PM
                                                Confusing and not very useful definition?2010/07/25 10:23 PM
                            Separate the OoO-ness from speculative-nesssomeone2010/07/20 08:02 PM
                              Separate the OoO-ness from speculative-nessThiago Kurovski2010/07/21 04:13 PM
            You are just quoting SINGLE precision flops? OMG what planet do you live? Vincent Diepeveen2010/07/19 10:26 AM
              The prior poster was talking about SP (NT)David Kanter2010/07/19 11:34 AM
                All FFT's need double precisionVincent Diepeveen2010/07/19 02:02 PM
                  All FFT's need double precisionDavid Kanter2010/07/19 02:09 PM
                    All FFT's need double precisionVincent Diepeveen2010/07/19 04:06 PM
                  All FFT's need double precision - notMichael S2010/07/20 01:16 AM
                    All FFT's need double precision - notUngo2010/07/21 12:04 AM
                      All FFT's need double precision - notMichael S2010/07/21 02:35 PM
                      All FFT's need double precision - notEduardoS2010/07/21 02:52 PM
                        All FFT's need double precision - notAnon2010/07/21 05:23 PM
                          All FFT's need double precision - notRicardo B2010/07/26 07:46 AM
                        I'm on a boat!anon2010/07/22 11:42 AM
                        All FFT's need double precision - notVincent Diepeveen2010/07/24 11:39 PM
                          All FFT's need double precision - notslacker2010/07/25 03:27 AM
                            All FFT's need double precision - notRicardo B2010/07/26 07:40 AM
                          All FFT's need double precision - notEduardoS2010/07/25 08:37 AM
                            All FFT's need double precision - notMichael S2010/07/25 10:43 AM
                    All FFT's need double precision - notVincent Diepeveen2010/07/24 11:19 PM
      A bit off baseEduardoS2010/07/08 04:08 PM
        A bit off baseGroo2010/07/08 06:11 PM
          A bit off basejohn mann2010/07/08 06:58 PM
            All right...let's cool it...David Kanter2010/07/08 07:54 PM
    A bit off baseVincent Diepeveen2010/07/19 03:36 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊