Bogus 100x speedups

Article: Computational Efficiency in Modern Processors
By: Vincent Diepeveen (diep.delete@this.xs4all.nl), November 10, 2009 8:26 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 11/10/09 wrote:
---------------------------
>Here's a few great examples on the N-body problem:
>
>http://www.cise.ufl.edu/~jgao/p_final.pdf
>
>They are claiming a very impressive 100X speed up. Look at the hardware they are
>comparing against though. I think they are also using the O(N^2) algorithm.
>
>http://markjstock.org/research/AIAA-2008-608-552.pdf
>
>This is a better paper where they actually are using an intelligent algorithm (better
>than O(N^2)) on the GPU. They see ~220GFLOP/s for N-body on a 8800 GTX...which
>is supposedly 17X better than a 2-core 2.4GHz Opteron.
>
>Just a quick reference check here:
>220 GFLOPs/17 = 13 GFLOP/s for the Opteron
>13 GFLOPs/2 cores = 6.5 GFLOP/s per core
>6.5 GFLOPs/2.4GHz = 2.7 FLOP/cycle per core
>
>Here's the catch. An Opteron can execute 8 FLOP/s cycle, so the SW isn't really tuned that well.
>
>More problematic, this is on small problem size (0.5M particles). When the problem
>size scales up, the algorithm starts to backfire and the speedup drops to 9.6X.
>
>
>Here's some real work on this area that touches on these issues directly:
>
>http://www.lanl.gov/conferences/lacss/2009/presentations/vuduc.pdf
>
>They only consider FMM which is O(N), instead of O(N^2) or O(N log N). That's

O ( n log n) algorithms are very difficult at GPU to get to work. I studied a FFT there for big number multiplications.

A cpu has a great cache, you can do a lot within that cache, yet a gpu nonstop is running outside of that cache, a few kilobytes per core is really little for caching.

Even then at the cpu's it's interesting to compare the speed of the woltman (MiT) code doing a FFT (actually DWT) to multiply and prove primes, versus whatever so far people showed up with at GPU's there.

It's really hard then to get something to work at GPU's, because this code profits a lot from double precision calculations. So basically one needs a transform that with chinese remainder theorem works well single precision. that's entirely possible. Yet i don't really see an easy solution to keep within the caches with such a huge instruction level parallellism, which is much easier at the cpu. At cpu the bottleneck is really the RAM. Nehalem has DDR3 now and latest AMD's also, which is a step forwards.

So gpus can be a lot faster there than cpu's, yet not factor 100 or so. If you achieve factor 2 i bet everyone already starts buying gpu's :)

On paper factor 5+ is possible i calculated.

>quite reasonable since you should be using the most efficient algorithm. Although
>I suppose you could consider a CPU using the O(N) vs. a GPU using whatever their favorite algorithm is.
>
>See page 56: 2 socket Nehalem ~ 2 GPUs.
>
>And note the sensitivity of Nehalem to tuning - 25X speed up from tuning.
>
>I think that pretty clearly shows how specious these comparisons can be. The algorithm
>matters, tuning matters, hardware matters and most importantly - don't believe what
>you read until you can see the fine print.
>
>David
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Article: Computational Efficiency in Modern Processors by DKMoTheG2009/11/08 07:02 AM
  Article: Computational Efficiency in Modern Processors by DKnone2009/11/08 07:15 AM
  Silverthorne and OoO vs. InOrdMoTheG2009/11/08 07:22 AM
    Silverthorne and OoO vs. InOrdDavid Kanter2009/11/08 04:11 PM
      Magical 100x speedupsAM2009/11/09 09:03 AM
        Magical 100x speedupsDavid Kanter2009/11/09 12:41 PM
          Magical 100x speedupsnone2009/11/09 01:36 PM
            Magical speedupsDavid Kanter2009/11/09 03:24 PM
              Magical speedupsnone2009/11/09 03:40 PM
              Hardware SpecsMS2009/11/09 05:49 PM
                44x faster than a single cpu coreVincent Diepeveen2009/11/10 08:17 AM
              Magical speedupsVincent Diepeveen2009/11/10 08:02 AM
          Xeon 130x speedup vs XeonEric Bron2009/11/10 08:20 AM
          Magical 100x speedupsAM2009/11/10 10:42 AM
            Magical 100x speedupsLinus Torvalds2009/11/10 01:19 PM
              Mega speedupsAM2009/11/11 06:21 AM
        Bogus 100x speedupsDavid Kanter2009/11/10 01:26 AM
          No speedups for CPUs for the general programming populaceMoTheG2009/11/10 05:26 AM
          Bogus 100x speedups?2009/11/10 05:45 AM
          Bogus 100x speedupshobold2009/11/10 07:31 AM
          Bogus 100x speedupsVincent Diepeveen2009/11/10 08:26 AM
          Bogus 100x speedupssylt2009/11/10 10:00 AM
          Bogus 100x speedupsAM2009/11/10 10:47 AM
      GPU vs. CPUMoTheG2009/11/09 11:30 AM
        GPU vs. CPUa reader2009/11/09 07:58 PM
          ease of programmingMoTheG2009/11/09 11:45 PM
            yes for GPU programming you need non-public infoVincent Diepeveen2009/11/10 08:36 AM
              yes for GPU programming you need non-public infoPotatoswatter2009/11/11 08:06 AM
                yes for GPU programming you need non-public infoVincent Diepeveen2009/11/11 11:23 AM
                  yes for GPU programming you need non-public infoPotatoswatter2009/11/11 01:26 PM
                  Real businesses use GPGPU.Jouni Osmala2009/11/11 11:00 PM
        GPU vs. CPU?2009/11/10 06:01 AM
          2. try but most is said, just clarifyingMoTheG2009/11/10 10:24 AM
            2. try but most is said, just clarifying?2009/11/11 01:11 AM
              you missread meMoTheG2009/11/12 12:33 AM
                you missread me?2009/11/12 01:18 AM
            2. try but most is said, just clarifyingPotatoswatter2009/11/11 08:22 AM
              2. try but most is said, just clarifying?2009/11/12 01:22 AM
                loose, not so orderlyMoTheG2009/11/12 12:47 PM
                  loose, not so orderlyPotatoswatter2009/11/12 06:50 PM
                2. try but most is said, just clarifyingrwessel2009/11/12 01:01 PM
                  2. try but most is said, just clarifyingGabriele Svelto2009/11/13 12:39 AM
                    2. try but most is said, just clarifying?2009/11/13 01:14 AM
                      2. try but most is said, just clarifyingGabriele Svelto2009/11/13 01:30 AM
                      2. try but most is said, just clarifyingrwessel2009/11/13 01:24 PM
                  2. try but most is said, just clarifyingMichael S2009/11/14 01:08 PM
                    2. try but most is said, just clarifyingGabriele Svelto2009/11/14 11:38 PM
                      2. try but most is said, just clarifyingAndi Kleen2009/11/15 01:19 AM
                      2. try but most is said, just clarifyingMichael S2009/11/15 01:58 AM
                        2. try but most is said, just clarifyingEric Bron2009/11/15 02:25 AM
                          /MP optionEric Bron2009/11/15 02:33 AM
                            /MP optionPaul2009/11/15 09:42 AM
                              /MP optionEric Bron2009/11/15 01:22 PM
                        2. try but most is said, just clarifying?2009/11/15 03:13 AM
                          2. try but most is said, just clarifyingMichael S2009/11/15 05:14 AM
                  2. try but most is said, just clarifyingEugene Nalimov2009/11/14 09:24 PM
    Atom pointAM2009/11/09 09:00 AM
      Atom TDPDavid Kanter2009/11/09 12:48 PM
        Atom TDPhobold2009/11/10 07:41 AM
        Atom TDPAM2009/11/10 10:49 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?