By: Vincent Diepeveen (diep.delete@this.xs4all.nl), April 16, 2011 2:10 am
Room: Moderated Discussions
EduardoS (no@spam.com) on 4/14/11 wrote:
---------------------------
>Moritz (better@not.tell) on 4/14/11 wrote:
>---------------------------
>>The model will not work for those chips because it was not adapted/solved for them.
>>Your remark makes little sense.
>>The interesting thing will be to compare the "model's" parameters for the AMD HD
>>5series to those of the 6series. That will tell us if the move from 5 to 4 wide units was a success.
>
>Neither chip I have mentioned have 4 wide units...
Besides that the new 6000 series has 2.5x more multiplication resources than the 5000 series for the very crucial 32 x 32 bits.
Which for gpgpu matters really a lot.
Let me quote a few gpu programmers here.
"In games clocking a GPU higher is a huge advantage"
"In GAMES it's all about bandwidth to the RAM"
Most game programmers still have to learn the word 'algorithm'. They're really beginners level in algorithms,
compared to the pro's there.
Just by raw gpu speed they manage :)
In gpgpu however it's not about RAM bandwidth nor how high the thing has been clocked.
It's about having great caches and a good register file and units that can execute your instructions quickly.
To optimal write for the gpu, you hardly can do reads from the RAM; only from the caches you can do something.
Nvidia will do great in games and will remain doing great in games.
AMD in terms of gpgpu blows away nvidia, letting nvidia just look silly and that always will keep that way, because nvidia clocks their gpu's higher than AMD.
David's previous article was quite good, but this 'predicting' is utter crap of course.
He's comparing Fermi, the first generation Nvidia that has 32 x 32 bits multiplication (it was 24 bits multiplication before) versus old generation AMD that did have 32 bits multiplication, but only what was it 1 unit out of 5.
Todays generation, just like nvidia, has been more optimized for that.
So something is a factor 2.5x off in David's graphs.
For games you don't need to write an article about gpu's, there is zillions of game sites testing the latest gpu's there.
For gpgpu technical descriptions of gpu's is more interesting. What's really missing is fundamental information.
Simple stuff is important to know.
x = a + b;
(stall)
y = x + c;
For example from experiments of Volkov we know that Nvidia most likely needs 4 cycles to have 'x' available to be usable. So that would need an additional 3 other instructions to be executed prior to 'y = x + c';
How long does AMD need there?
I don't know, but for what i'm programming in opencl right now it's pretty crucial to know.
The only question with respect to AMD is when OpenCL will support more than 1 gpu per card. Their flagships are not so well supported by opencl yet, when that works, it'll blow things away of course.
OpenCL for now seems to be a blessing for AMD's gpgpu programming.
It'll take off bigtime.
Here is another thing we don't know about gpgpu. How *reliable* are results calculated by gpgpu?
Many applications optimized to gpgpu nowadays are 32 bits integer based. GPU's simply dominate with 32 bits integers.
Yet when displaying graphics, if a pixel would be wrong somewhere in the result, and especially if it is just 1 or a few pallettecolors off, odds you notice it, is not so huge. Very few humans can see the difference between colors that are closeby. Inherently all graphics already gets rounded off in all kind of ways and it moves fast over the screen.
As all calculations are parallel, in gpgpu you will not have deterministic behaviour.
Add to that, that the gpu's burn nearly a 400 watt or so when calculating gpgpu, like 100 watts over spec, and must cool everything with 1 small lousy fan that makes a screaming noise.
How many errors do the gpu's make there however in calculations?
So not RAM errors, but errors of the execution units because of the huge heat and power, as well as the complete luxury of having been able so far to do everything without any software out there that checks for correctness.
The cheapest nvidia to do serious gpgpu with is the Quadro 6000. The gamers cards all are lobotomized too much there.
That quadro is 1200 dollar or so and has 448 cores @ 1.2Ghz or so. Just like Tesla.
AMD offers 3072 cores (PE's) @ 830Mhz and it has a much better price. Additionally also the 6970 which i have here for example, is not lobotomized. I bought it for 318 euro.
in cpu versus cpu reviews, if AMD or Intel was 2% faster there or 3% there, huge discussions.
In gpgpu we're looking at a whopping factor 3 difference in speed or so, at a factor 2 cheaper price at least (tesla's not even counted yet).
Regards,
Vincent
---------------------------
>Moritz (better@not.tell) on 4/14/11 wrote:
>---------------------------
>>The model will not work for those chips because it was not adapted/solved for them.
>>Your remark makes little sense.
>>The interesting thing will be to compare the "model's" parameters for the AMD HD
>>5series to those of the 6series. That will tell us if the move from 5 to 4 wide units was a success.
>
>Neither chip I have mentioned have 4 wide units...
Besides that the new 6000 series has 2.5x more multiplication resources than the 5000 series for the very crucial 32 x 32 bits.
Which for gpgpu matters really a lot.
Let me quote a few gpu programmers here.
"In games clocking a GPU higher is a huge advantage"
"In GAMES it's all about bandwidth to the RAM"
Most game programmers still have to learn the word 'algorithm'. They're really beginners level in algorithms,
compared to the pro's there.
Just by raw gpu speed they manage :)
In gpgpu however it's not about RAM bandwidth nor how high the thing has been clocked.
It's about having great caches and a good register file and units that can execute your instructions quickly.
To optimal write for the gpu, you hardly can do reads from the RAM; only from the caches you can do something.
Nvidia will do great in games and will remain doing great in games.
AMD in terms of gpgpu blows away nvidia, letting nvidia just look silly and that always will keep that way, because nvidia clocks their gpu's higher than AMD.
David's previous article was quite good, but this 'predicting' is utter crap of course.
He's comparing Fermi, the first generation Nvidia that has 32 x 32 bits multiplication (it was 24 bits multiplication before) versus old generation AMD that did have 32 bits multiplication, but only what was it 1 unit out of 5.
Todays generation, just like nvidia, has been more optimized for that.
So something is a factor 2.5x off in David's graphs.
For games you don't need to write an article about gpu's, there is zillions of game sites testing the latest gpu's there.
For gpgpu technical descriptions of gpu's is more interesting. What's really missing is fundamental information.
Simple stuff is important to know.
x = a + b;
(stall)
y = x + c;
For example from experiments of Volkov we know that Nvidia most likely needs 4 cycles to have 'x' available to be usable. So that would need an additional 3 other instructions to be executed prior to 'y = x + c';
How long does AMD need there?
I don't know, but for what i'm programming in opencl right now it's pretty crucial to know.
The only question with respect to AMD is when OpenCL will support more than 1 gpu per card. Their flagships are not so well supported by opencl yet, when that works, it'll blow things away of course.
OpenCL for now seems to be a blessing for AMD's gpgpu programming.
It'll take off bigtime.
Here is another thing we don't know about gpgpu. How *reliable* are results calculated by gpgpu?
Many applications optimized to gpgpu nowadays are 32 bits integer based. GPU's simply dominate with 32 bits integers.
Yet when displaying graphics, if a pixel would be wrong somewhere in the result, and especially if it is just 1 or a few pallettecolors off, odds you notice it, is not so huge. Very few humans can see the difference between colors that are closeby. Inherently all graphics already gets rounded off in all kind of ways and it moves fast over the screen.
As all calculations are parallel, in gpgpu you will not have deterministic behaviour.
Add to that, that the gpu's burn nearly a 400 watt or so when calculating gpgpu, like 100 watts over spec, and must cool everything with 1 small lousy fan that makes a screaming noise.
How many errors do the gpu's make there however in calculations?
So not RAM errors, but errors of the execution units because of the huge heat and power, as well as the complete luxury of having been able so far to do everything without any software out there that checks for correctness.
The cheapest nvidia to do serious gpgpu with is the Quadro 6000. The gamers cards all are lobotomized too much there.
That quadro is 1200 dollar or so and has 448 cores @ 1.2Ghz or so. Just like Tesla.
AMD offers 3072 cores (PE's) @ 830Mhz and it has a much better price. Additionally also the 6970 which i have here for example, is not lobotomized. I bought it for 318 euro.
in cpu versus cpu reviews, if AMD or Intel was 2% faster there or 3% there, huge discussions.
In gpgpu we're looking at a whopping factor 3 difference in speed or so, at a factor 2 cheaper price at least (tesla's not even counted yet).
Regards,
Vincent
Topic | Posted By | Date |
---|---|---|
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/11 11:55 PM |
Graph is not red-green colorblind friendly (NT) | RatherNotSay | 2011/04/12 03:51 AM |
Fixed | David Kanter | 2011/04/12 08:46 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | James | 2011/04/12 12:30 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 02:51 PM |
Try HD6450 or HD6850 | EduardoS | 2011/04/12 03:31 PM |
Try HD6450 or HD6850 | David Kanter | 2011/04/13 10:25 AM |
Try HD6450 or HD6850 | EduardoS | 2011/04/13 03:20 PM |
of cause | Moritz | 2011/04/14 08:03 AM |
of cause | EduardoS | 2011/04/14 01:55 PM |
Barts = 5D | Moritz | 2011/04/14 09:26 PM |
Barts = 5D | Antti-Ville Tuunainen | 2011/04/15 12:38 AM |
Limiting fixed function units | Moritz | 2011/04/15 04:28 AM |
Limiting fixed function units | Vincent Diepeveen | 2011/04/20 02:38 AM |
lack of detail | Moritz | 2011/04/20 09:24 AM |
lack of detail | EduardoS | 2011/04/20 11:45 AM |
gpgpu | Vincent Diepeveen | 2011/04/16 02:10 AM |
gpgpu | EduardoS | 2011/04/17 12:31 PM |
gpgpu | Groo | 2011/04/17 12:58 PM |
gpgpu | EduardoS | 2011/04/17 01:08 PM |
gpgpu | Ian Ameline | 2011/04/18 03:55 PM |
gpgpu | Ping-Che Chen | 2011/04/19 12:59 AM |
GPU numerical compliance | Sylvain Collange | 2011/04/19 11:38 AM |
GPU numerical compliance | Vincent Diepeveen | 2011/04/20 02:17 AM |
gpgpu | Vincent Diepeveen | 2011/04/20 02:02 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 04:41 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 05:52 AM |
gpgpu and core counts | none | 2011/04/20 07:05 AM |
gpgpu and core counts | EduardoS | 2011/04/20 11:36 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 10:16 AM |
gpgpu and core counts | EduardoS | 2011/04/20 11:34 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 07:24 PM |
gpgpu and core counts | EduardoS | 2011/04/20 08:55 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 06:48 AM |
gpgpu and core counts | EduardoS | 2011/04/22 01:41 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/21 10:42 AM |
AMD Compute and Texture Fetch | Vincent Diepeveen | 2011/04/22 01:14 AM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 10:53 AM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 01:46 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 02:02 PM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 02:18 PM |
AMD Compute and Texture Fetch | anon | 2011/04/22 03:30 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 09:17 PM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 12:12 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 10:23 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/22 02:11 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 08:39 AM |
Keep the crazy politics out of this | Vincent Diepeveen | 2011/04/22 09:12 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 10:44 AM |
gpgpu and core counts | Jouni Osmala | 2011/04/22 11:06 AM |
gpgpu | EduardoS | 2011/04/20 11:59 AM |
gpgpu | Vincent Diepeveen | 2011/04/20 12:37 PM |
gpgpu | EduardoS | 2011/04/20 05:27 PM |
gpgpu | Vincent Diepeveen | 2011/04/21 02:06 AM |
gpgpu | EduardoS | 2011/04/22 02:00 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | PiedPiper | 2011/04/12 10:05 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 10:42 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | MS | 2011/04/15 05:04 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/16 02:25 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/16 08:42 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Vincent Diepeveen | 2011/04/20 02:20 AM |
memory | Moritz | 2011/04/14 09:03 PM |
memory - more | Moritz | 2011/04/15 11:11 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/14 11:30 AM |