By: Vincent Diepeveen (diep.delete@this.xs4all.nl), November 10, 2009 8:02 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 11/9/09 wrote:
---------------------------
>none (none@none.com) on 11/9/09 wrote:
>---------------------------
>>This article:
>>http://www.ddj.com/cpp/207200659
>>
>>has a link to this:
>>http://www.ks.uiuc.edu/Research/vmd/publications/siam2008vmdcuda.pdf
>>
>>At least that is something not coming directly out of nVidia
>>marketing department.
>
>I'm still a little skeptical, since the software stack for the GPU and CPU aren't
>disclosed - although they mention what hardware is used.
>
>Also, UIUC is being funded to do this stuff and is evangelizing CUDA, so it's in
>their best interest to make the GPUs look good.
>
>I'm not saying the numbers are wrong, but I don't know what they represent. You want to know things like:
Well it's easy to quote someone here:
"If you can't speedup those scientific codes by factor 1000, then you really suck as an algorithmic expert"
Dieter Buerssner
Any claim of over factor 10 for any application at this moment is far off reality of course.
In any form of calculation you can do, the GPU's might achieve a factor 5-10 times faster speed. That's not just bandwidth, that's especially because gpu's at this point are simply not faster than that.
It's however obvious that a low level optimized gpgpu program has been programmed by usually a reasonable programmer, whereas the code for x86 usually is total crap.
Most don't even parallellize the software well or not at all in fact. I tend to remember i helped someone out with some quantum mechanics to parallellize it in a tricky manner at 4 cores, moving from 1 core (with tricky i mean that it really would need all 4 cores to be fulltime scheduling the threads to get a good parallel speedup; so if 1 of the cores would be busy with other tasks, it would not scale well).
That also means better cache usage and that you simply suffer less from a bandwidth problem to RAM.
If your chip is good in X and bad in Y, then avoid Y. If chip is good in Y and bad in X then avoid X. That's not what they do with their software however.
>1. What algorithms are used? Do they favor CPU or GPU?
Is it really interesting a claim of factor 100+ is just not realistic for now.
Usually also they compare some oldie P4 cpu with the latest tesla using 4 gpu's and $15k a setup.
The correct compare with that is a 32 core AMD setup with 8 sockets, say 2.2Ghz oldie second hand cheap at ebay available 8354 cpu's and 2 mainboards connected to each other with a lot of RAM.
That's $10k and it eats similar power to a 4 gpu Tesla setup.
>2. What compiler/libraries?
>3. What HW (obviously)?
>4. How well tuned is the app on each stack?
>
>It's really important to know the details when making claims like this. A lot
>of papers that I've seen do a really poor job of this, but they tend to be miles ahead of NV's technical marketing.
>
>In contrast, when I see benchmarks from Intel or AMD on CPUs, they tend to tell
>you how the comparison is done - so you can find out if they are playing games.
>
>David
---------------------------
>none (none@none.com) on 11/9/09 wrote:
>---------------------------
>>This article:
>>http://www.ddj.com/cpp/207200659
>>
>>has a link to this:
>>http://www.ks.uiuc.edu/Research/vmd/publications/siam2008vmdcuda.pdf
>>
>>At least that is something not coming directly out of nVidia
>>marketing department.
>
>I'm still a little skeptical, since the software stack for the GPU and CPU aren't
>disclosed - although they mention what hardware is used.
>
>Also, UIUC is being funded to do this stuff and is evangelizing CUDA, so it's in
>their best interest to make the GPUs look good.
>
>I'm not saying the numbers are wrong, but I don't know what they represent. You want to know things like:
Well it's easy to quote someone here:
"If you can't speedup those scientific codes by factor 1000, then you really suck as an algorithmic expert"
Dieter Buerssner
Any claim of over factor 10 for any application at this moment is far off reality of course.
In any form of calculation you can do, the GPU's might achieve a factor 5-10 times faster speed. That's not just bandwidth, that's especially because gpu's at this point are simply not faster than that.
It's however obvious that a low level optimized gpgpu program has been programmed by usually a reasonable programmer, whereas the code for x86 usually is total crap.
Most don't even parallellize the software well or not at all in fact. I tend to remember i helped someone out with some quantum mechanics to parallellize it in a tricky manner at 4 cores, moving from 1 core (with tricky i mean that it really would need all 4 cores to be fulltime scheduling the threads to get a good parallel speedup; so if 1 of the cores would be busy with other tasks, it would not scale well).
That also means better cache usage and that you simply suffer less from a bandwidth problem to RAM.
If your chip is good in X and bad in Y, then avoid Y. If chip is good in Y and bad in X then avoid X. That's not what they do with their software however.
>1. What algorithms are used? Do they favor CPU or GPU?
Is it really interesting a claim of factor 100+ is just not realistic for now.
Usually also they compare some oldie P4 cpu with the latest tesla using 4 gpu's and $15k a setup.
The correct compare with that is a 32 core AMD setup with 8 sockets, say 2.2Ghz oldie second hand cheap at ebay available 8354 cpu's and 2 mainboards connected to each other with a lot of RAM.
That's $10k and it eats similar power to a 4 gpu Tesla setup.
>2. What compiler/libraries?
>3. What HW (obviously)?
>4. How well tuned is the app on each stack?
>
>It's really important to know the details when making claims like this. A lot
>of papers that I've seen do a really poor job of this, but they tend to be miles ahead of NV's technical marketing.
>
>In contrast, when I see benchmarks from Intel or AMD on CPUs, they tend to tell
>you how the comparison is done - so you can find out if they are playing games.
>
>David
Topic | Posted By | Date |
---|---|---|
Article: Computational Efficiency in Modern Processors by DK | MoTheG | 2009/11/08 07:02 AM |
Article: Computational Efficiency in Modern Processors by DK | none | 2009/11/08 07:15 AM |
Silverthorne and OoO vs. InOrd | MoTheG | 2009/11/08 07:22 AM |
Silverthorne and OoO vs. InOrd | David Kanter | 2009/11/08 04:11 PM |
Magical 100x speedups | AM | 2009/11/09 09:03 AM |
Magical 100x speedups | David Kanter | 2009/11/09 12:41 PM |
Magical 100x speedups | none | 2009/11/09 01:36 PM |
Magical speedups | David Kanter | 2009/11/09 03:24 PM |
Magical speedups | none | 2009/11/09 03:40 PM |
Hardware Specs | MS | 2009/11/09 05:49 PM |
44x faster than a single cpu core | Vincent Diepeveen | 2009/11/10 08:17 AM |
Magical speedups | Vincent Diepeveen | 2009/11/10 08:02 AM |
Xeon 130x speedup vs Xeon | Eric Bron | 2009/11/10 08:20 AM |
Magical 100x speedups | AM | 2009/11/10 10:42 AM |
Magical 100x speedups | Linus Torvalds | 2009/11/10 01:19 PM |
Mega speedups | AM | 2009/11/11 06:21 AM |
Bogus 100x speedups | David Kanter | 2009/11/10 01:26 AM |
No speedups for CPUs for the general programming populace | MoTheG | 2009/11/10 05:26 AM |
Bogus 100x speedups | ? | 2009/11/10 05:45 AM |
Bogus 100x speedups | hobold | 2009/11/10 07:31 AM |
Bogus 100x speedups | Vincent Diepeveen | 2009/11/10 08:26 AM |
Bogus 100x speedups | sylt | 2009/11/10 10:00 AM |
Bogus 100x speedups | AM | 2009/11/10 10:47 AM |
GPU vs. CPU | MoTheG | 2009/11/09 11:30 AM |
GPU vs. CPU | a reader | 2009/11/09 07:58 PM |
ease of programming | MoTheG | 2009/11/09 11:45 PM |
yes for GPU programming you need non-public info | Vincent Diepeveen | 2009/11/10 08:36 AM |
yes for GPU programming you need non-public info | Potatoswatter | 2009/11/11 08:06 AM |
yes for GPU programming you need non-public info | Vincent Diepeveen | 2009/11/11 11:23 AM |
yes for GPU programming you need non-public info | Potatoswatter | 2009/11/11 01:26 PM |
Real businesses use GPGPU. | Jouni Osmala | 2009/11/11 11:00 PM |
GPU vs. CPU | ? | 2009/11/10 06:01 AM |
2. try but most is said, just clarifying | MoTheG | 2009/11/10 10:24 AM |
2. try but most is said, just clarifying | ? | 2009/11/11 01:11 AM |
you missread me | MoTheG | 2009/11/12 12:33 AM |
you missread me | ? | 2009/11/12 01:18 AM |
2. try but most is said, just clarifying | Potatoswatter | 2009/11/11 08:22 AM |
2. try but most is said, just clarifying | ? | 2009/11/12 01:22 AM |
loose, not so orderly | MoTheG | 2009/11/12 12:47 PM |
loose, not so orderly | Potatoswatter | 2009/11/12 06:50 PM |
2. try but most is said, just clarifying | rwessel | 2009/11/12 01:01 PM |
2. try but most is said, just clarifying | Gabriele Svelto | 2009/11/13 12:39 AM |
2. try but most is said, just clarifying | ? | 2009/11/13 01:14 AM |
2. try but most is said, just clarifying | Gabriele Svelto | 2009/11/13 01:30 AM |
2. try but most is said, just clarifying | rwessel | 2009/11/13 01:24 PM |
2. try but most is said, just clarifying | Michael S | 2009/11/14 01:08 PM |
2. try but most is said, just clarifying | Gabriele Svelto | 2009/11/14 11:38 PM |
2. try but most is said, just clarifying | Andi Kleen | 2009/11/15 01:19 AM |
2. try but most is said, just clarifying | Michael S | 2009/11/15 01:58 AM |
2. try but most is said, just clarifying | Eric Bron | 2009/11/15 02:25 AM |
/MP option | Eric Bron | 2009/11/15 02:33 AM |
/MP option | Paul | 2009/11/15 09:42 AM |
/MP option | Eric Bron | 2009/11/15 01:22 PM |
2. try but most is said, just clarifying | ? | 2009/11/15 03:13 AM |
2. try but most is said, just clarifying | Michael S | 2009/11/15 05:14 AM |
2. try but most is said, just clarifying | Eugene Nalimov | 2009/11/14 09:24 PM |
Atom point | AM | 2009/11/09 09:00 AM |
Atom TDP | David Kanter | 2009/11/09 12:48 PM |
Atom TDP | hobold | 2009/11/10 07:41 AM |
Atom TDP | AM | 2009/11/10 10:49 AM |