By: Heikki Kultala (hkultala.delete@this.iki.NOSPAM.fi), April 20, 2011 10:16 am
Room: Moderated Discussions
Vincent Diepeveen (diep@xs4all.nl) on 4/20/11 wrote:
---------------------------
>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>---------------------------
>>>It's simple. It's 3072 cores @ 0.88Ghz versus nvidia 448 cores @ 1.2Ghz. 3072 cores
>>>always win by factor 3-4 then or so.
>>>
>>>No discussions there.
>>
>>Wrong.
>>
>>It's 24 ATI cores per chip versus 28-32 nvidia cores per chip.
>>
>>And, it's 384 ATI SPMD lanes per chip versus 448-512 nvidia SPMD lanes per chip.
>>
>>VLIW ALU != core
>
>I wrote it in popular language, but that doesn't stop idiots like you.
Says how things technically really are makes me an idiot?
>It's 3072 PE's versus 448 PE's.
Those are unit counts coming from marketting.
Those are FP ALU counts. How to feed data to those matters.
There are only 384 SPMD lanes per Cayman chip,
so one chip can execute only 384 VLIW instructions per cycle (if the program counters are correctly aligned, if not, the worst case is 24 / chip)
But, those VLIW operations can actually include 5, not 4 operations total. (but the one has to be branch, only 4 FP operations).
What makes this more complicated is how memory operations are handled.
If memory operation is being handled, no ALU operations can execute at same time on ATI. I'm not sure how this goes on nvidia, might be similar.
>What's going to be FASTER in well designed gpgpu codes?
>
>0.83Ghz * 3072 PE's is *always* going to annihilate in well designed codes a meager 448 PE's @ 1.2Ghz Tesla.
Did I say they would not? No.
I just told you are counting your cores incorrectly. And telling you are wrong seems to make me an idiot.
But if the code does not have any ILP, only parallelism between work items, then 3/4 or those ATI ALU's are idling. This practically means badly optimized code.
We cannot ignore the actual SPMD lane count. Good way is to calculate the SPMD lanes, and THEN in addition calculate that ATI can do 4-way FP VLIW on one SPMD lane, nvidia only single floating point operation.
---------------------------
>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>---------------------------
>>>It's simple. It's 3072 cores @ 0.88Ghz versus nvidia 448 cores @ 1.2Ghz. 3072 cores
>>>always win by factor 3-4 then or so.
>>>
>>>No discussions there.
>>
>>Wrong.
>>
>>It's 24 ATI cores per chip versus 28-32 nvidia cores per chip.
>>
>>And, it's 384 ATI SPMD lanes per chip versus 448-512 nvidia SPMD lanes per chip.
>>
>>VLIW ALU != core
>
>I wrote it in popular language, but that doesn't stop idiots like you.
Says how things technically really are makes me an idiot?
>It's 3072 PE's versus 448 PE's.
Those are unit counts coming from marketting.
Those are FP ALU counts. How to feed data to those matters.
There are only 384 SPMD lanes per Cayman chip,
so one chip can execute only 384 VLIW instructions per cycle (if the program counters are correctly aligned, if not, the worst case is 24 / chip)
But, those VLIW operations can actually include 5, not 4 operations total. (but the one has to be branch, only 4 FP operations).
What makes this more complicated is how memory operations are handled.
If memory operation is being handled, no ALU operations can execute at same time on ATI. I'm not sure how this goes on nvidia, might be similar.
>What's going to be FASTER in well designed gpgpu codes?
>
>0.83Ghz * 3072 PE's is *always* going to annihilate in well designed codes a meager 448 PE's @ 1.2Ghz Tesla.
Did I say they would not? No.
I just told you are counting your cores incorrectly. And telling you are wrong seems to make me an idiot.
But if the code does not have any ILP, only parallelism between work items, then 3/4 or those ATI ALU's are idling. This practically means badly optimized code.
We cannot ignore the actual SPMD lane count. Good way is to calculate the SPMD lanes, and THEN in addition calculate that ATI can do 4-way FP VLIW on one SPMD lane, nvidia only single floating point operation.
Topic | Posted By | Date |
---|---|---|
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/11 11:55 PM |
Graph is not red-green colorblind friendly (NT) | RatherNotSay | 2011/04/12 03:51 AM |
Fixed | David Kanter | 2011/04/12 08:46 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | James | 2011/04/12 12:30 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 02:51 PM |
Try HD6450 or HD6850 | EduardoS | 2011/04/12 03:31 PM |
Try HD6450 or HD6850 | David Kanter | 2011/04/13 10:25 AM |
Try HD6450 or HD6850 | EduardoS | 2011/04/13 03:20 PM |
of cause | Moritz | 2011/04/14 08:03 AM |
of cause | EduardoS | 2011/04/14 01:55 PM |
Barts = 5D | Moritz | 2011/04/14 09:26 PM |
Barts = 5D | Antti-Ville Tuunainen | 2011/04/15 12:38 AM |
Limiting fixed function units | Moritz | 2011/04/15 04:28 AM |
Limiting fixed function units | Vincent Diepeveen | 2011/04/20 02:38 AM |
lack of detail | Moritz | 2011/04/20 09:24 AM |
lack of detail | EduardoS | 2011/04/20 11:45 AM |
gpgpu | Vincent Diepeveen | 2011/04/16 02:10 AM |
gpgpu | EduardoS | 2011/04/17 12:31 PM |
gpgpu | Groo | 2011/04/17 12:58 PM |
gpgpu | EduardoS | 2011/04/17 01:08 PM |
gpgpu | Ian Ameline | 2011/04/18 03:55 PM |
gpgpu | Ping-Che Chen | 2011/04/19 12:59 AM |
GPU numerical compliance | Sylvain Collange | 2011/04/19 11:38 AM |
GPU numerical compliance | Vincent Diepeveen | 2011/04/20 02:17 AM |
gpgpu | Vincent Diepeveen | 2011/04/20 02:02 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 04:41 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 05:52 AM |
gpgpu and core counts | none | 2011/04/20 07:05 AM |
gpgpu and core counts | EduardoS | 2011/04/20 11:36 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 10:16 AM |
gpgpu and core counts | EduardoS | 2011/04/20 11:34 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 07:24 PM |
gpgpu and core counts | EduardoS | 2011/04/20 08:55 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 06:48 AM |
gpgpu and core counts | EduardoS | 2011/04/22 01:41 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/21 10:42 AM |
AMD Compute and Texture Fetch | Vincent Diepeveen | 2011/04/22 01:14 AM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 10:53 AM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 01:46 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 02:02 PM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 02:18 PM |
AMD Compute and Texture Fetch | anon | 2011/04/22 03:30 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 09:17 PM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 12:12 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 10:23 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/22 02:11 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 08:39 AM |
Keep the crazy politics out of this | Vincent Diepeveen | 2011/04/22 09:12 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 10:44 AM |
gpgpu and core counts | Jouni Osmala | 2011/04/22 11:06 AM |
gpgpu | EduardoS | 2011/04/20 11:59 AM |
gpgpu | Vincent Diepeveen | 2011/04/20 12:37 PM |
gpgpu | EduardoS | 2011/04/20 05:27 PM |
gpgpu | Vincent Diepeveen | 2011/04/21 02:06 AM |
gpgpu | EduardoS | 2011/04/22 02:00 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | PiedPiper | 2011/04/12 10:05 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 10:42 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | MS | 2011/04/15 05:04 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/16 02:25 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/16 08:42 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Vincent Diepeveen | 2011/04/20 02:20 AM |
memory | Moritz | 2011/04/14 09:03 PM |
memory - more | Moritz | 2011/04/15 11:11 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/14 11:30 AM |