Article: Parallelism at HotPar 2010
By: Vincent Diepeveen (diep.delete@this.xs4all.nl), August 24, 2010 7:18 am
Room: Moderated Discussions
none (none@none.com) on 8/9/10 wrote:
---------------------------
>Richard (no@email.com) on 8/9/10 wrote:
>---------------------------
>[...]
>
>Thanks a lot for this, it looks less biased than what I read
>in nVidia-linked papers talked about in this thread.
Look it's rather trivial, if you want to program some problem onto the GPU that you previously did do at the cpu's, then also in my calculations the speedup is roughly 5x to 10x, pure theoretic spoken; not practical yet, over a quadcore.
So 6-core intels make that something like 3x to 7x and magny-cours already hammers away a lot of that.
Yet that advantage
a) is only for AMD, not for nvidia
Nvidia is SLOWER in most gpgpu related tasks.
Now of course if you go search then you'll find applications where gpu's have been designed for in the first place, so you can quickly run that at the gpu's. That's not the question.
Many number crunching projects that do not require lots of i/o nor lots of RAM, the question is whether you can write THAT for gpu.
Usually the answer after a lot of puzzling is YES, but seldom it gets more than this 5x to 10x speedup, if you look purely theoretical.
AMD fastest card is right now a bit under 1000 euro and it delivers 5 Tflop single precision (4.6 Tflop actually at 750 Mhz yet they clocked it nearly a Ghz in the sapphire version), versus nvidia's fermi is a tad over 1 Tflop.
All single precision.
These cards soon are nearly 300 watt at full usage, machine not counted yet, that added it'll be 500 watt or so.
A 4 socket AMD box i have here eats 400 watt.
I assume magny cours machine also is in that range.
It has 48 cores @ 2.2Ghz at full configuration i saw for $9200 at ebay.
Single precision that also is a lot.
48 cores * 2.2 Ghz * 4 flops * 2 simd units = 844.8 Gflop single precision.
So theory tells us it's up to 5 times faster, yet it's
1000 euro versus nearly 10k euro.
So there is a price tag as well.
So nvidia basically already is gone, if ECC is not your worry (which it is in HPC).
Power is no issue for the government nor for big companies when doing crunching, it is a big worry for users at home though. Yet there is also a pricetag to it.
400 watt is simply 400 euro here at home a year. It's 20 euro for big power users.
Yet my office can't have too many power monsters, the airco can't remove THAT MUCH and i don't intend to have it run day and night.
This is something where GPU's scale bad. Their power envelope really is BIG.
Pricewise however, objectively seen, if you calculate it on year basis, GPU's kill away any hardware, provided you really do big effort to have it run well. It is a fulltime job simply to get things done at a GPU.
It's not the government but big COMPANIES that use many AMD cards as we speak to get the crunching job done.
Amazingly all postings i see everywhere are regarding nvidia, but realize nvidia has 240 cores and AMD has 3200 streamcores. That's more like factor 10 more or something.
So for the non-ecc business crunchers, nvidia loses it everywhere to AMD and will keep losing it, as their concept is more tricky.
Now of course with many games that run on gpu's, it is the case that this opengl and directx simply doesn't parallellize THAT WELL, that it can use the full potential of 3200 streamcores; otherwise in every benchmark of course nvidia would lose it bigtime to AMD.
Yet in gpgpu, you CAN use all those streamcores very nice, if you do big effort.
reports are right now 25%-30% IPC at nvidia and 50% at AMD that the gpu programmers achieve for the same application.
Realize that is total fulltime work for BIG organisations, usually quantummechanica calculations and such...
Vincent
---------------------------
>Richard (no@email.com) on 8/9/10 wrote:
>---------------------------
>[...]
>
>Thanks a lot for this, it looks less biased than what I read
>in nVidia-linked papers talked about in this thread.
Look it's rather trivial, if you want to program some problem onto the GPU that you previously did do at the cpu's, then also in my calculations the speedup is roughly 5x to 10x, pure theoretic spoken; not practical yet, over a quadcore.
So 6-core intels make that something like 3x to 7x and magny-cours already hammers away a lot of that.
Yet that advantage
a) is only for AMD, not for nvidia
Nvidia is SLOWER in most gpgpu related tasks.
Now of course if you go search then you'll find applications where gpu's have been designed for in the first place, so you can quickly run that at the gpu's. That's not the question.
Many number crunching projects that do not require lots of i/o nor lots of RAM, the question is whether you can write THAT for gpu.
Usually the answer after a lot of puzzling is YES, but seldom it gets more than this 5x to 10x speedup, if you look purely theoretical.
AMD fastest card is right now a bit under 1000 euro and it delivers 5 Tflop single precision (4.6 Tflop actually at 750 Mhz yet they clocked it nearly a Ghz in the sapphire version), versus nvidia's fermi is a tad over 1 Tflop.
All single precision.
These cards soon are nearly 300 watt at full usage, machine not counted yet, that added it'll be 500 watt or so.
A 4 socket AMD box i have here eats 400 watt.
I assume magny cours machine also is in that range.
It has 48 cores @ 2.2Ghz at full configuration i saw for $9200 at ebay.
Single precision that also is a lot.
48 cores * 2.2 Ghz * 4 flops * 2 simd units = 844.8 Gflop single precision.
So theory tells us it's up to 5 times faster, yet it's
1000 euro versus nearly 10k euro.
So there is a price tag as well.
So nvidia basically already is gone, if ECC is not your worry (which it is in HPC).
Power is no issue for the government nor for big companies when doing crunching, it is a big worry for users at home though. Yet there is also a pricetag to it.
400 watt is simply 400 euro here at home a year. It's 20 euro for big power users.
Yet my office can't have too many power monsters, the airco can't remove THAT MUCH and i don't intend to have it run day and night.
This is something where GPU's scale bad. Their power envelope really is BIG.
Pricewise however, objectively seen, if you calculate it on year basis, GPU's kill away any hardware, provided you really do big effort to have it run well. It is a fulltime job simply to get things done at a GPU.
It's not the government but big COMPANIES that use many AMD cards as we speak to get the crunching job done.
Amazingly all postings i see everywhere are regarding nvidia, but realize nvidia has 240 cores and AMD has 3200 streamcores. That's more like factor 10 more or something.
So for the non-ecc business crunchers, nvidia loses it everywhere to AMD and will keep losing it, as their concept is more tricky.
Now of course with many games that run on gpu's, it is the case that this opengl and directx simply doesn't parallellize THAT WELL, that it can use the full potential of 3200 streamcores; otherwise in every benchmark of course nvidia would lose it bigtime to AMD.
Yet in gpgpu, you CAN use all those streamcores very nice, if you do big effort.
reports are right now 25%-30% IPC at nvidia and 50% at AMD that the gpu programmers achieve for the same application.
Realize that is total fulltime work for BIG organisations, usually quantummechanica calculations and such...
Vincent