gpgpu and core counts

Article: Predicting AMD and Nvidia GPU Performance
By: Heikki Kultala (hkultala.delete@this.iki.NOSPAM.fi), April 21, 2011 11:23 am
Room: Moderated Discussions
The main point in my original message was to comment about using term "core" incorrectly. You seem to have read lots text between my lines that is not there

Vincent Diepeveen (diep@xs4all.nl) on 4/20/11 wrote:
---------------------------
>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>---------------------------
>>Vincent Diepeveen (diep@xs4all.nl) on 4/20/11 wrote:
>>---------------------------
>>>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>>>---------------------------
>>>>>It's simple. It's 3072 cores @ 0.88Ghz versus nvidia 448 cores @ 1.2Ghz. 3072 cores
>>>>>always win by factor 3-4 then or so.
>>>>>
>>>>>No discussions there.
>>>>
>>>>Wrong.
>>>>
>>>>It's 24 ATI cores per chip versus 28-32 nvidia cores per chip.
>>>>
>>>>And, it's 384 ATI SPMD lanes per chip versus 448-512 nvidia SPMD lanes per chip.
>>>>
>>>>VLIW ALU != core
>>>
>>>I wrote it in popular language, but that doesn't stop idiots like you.
>>
>>Says how things technically really are makes me an idiot?
>
>Because first of all you say it wrong and also deliberately.
>It has 2 gpu's and therefore 48 COMPUTE UNITS. Not 24.

I Assume everyone can do the multiplication by 2 themselves.
I just posted what one chip has.

>Secondly you want to refer to some lobotomized overclocked gamers card of nvidia,
>but let's just look at their topend gpgpu card, as we're discussing gpgpu here, not gamers.

I did not refer to anything noone else referred before me.

I used numbers "28-32" and "448-512" so that you can pick the model YOU want to compare to. (and I just said what the chip HAS, so I used also the full numbers even thought on most models some are disabled)

>For gamers you can find dozens of others sites that are better and benchmarking
>every game already.

Now you are talking to wrong address, game performance is not very interesting for me.

>>>It's 3072 PE's versus 448 PE's.
>>
>>Those are unit counts coming from marketting.
>>Those are FP ALU counts. How to feed data to those matters.
>>
>>There are only 384 SPMD lanes per Cayman chip,
>>so one chip can execute only 384 VLIW instructions per cycle (if the program counters
>>are correctly aligned, if not, the worst case is 24 / chip)
>
>there's 2 chips and you know this very well. You have to compare 48 compute units
>and 3072 PE's with the 448 ones of nvidia.

yes, I know that.

>>But, those VLIW operations can actually include 5, not 4 operations total. (but
>>the one has to be branch, only 4 FP operations).
>>What makes this more complicated is how memory operations are handled.
>>If memory operation is being handled, no ALU operations can execute at same time
>>on ATI. I'm not sure how this goes on nvidia, might be similar.
>
>Actually, description David gave of how the RAM functions is wrong, but as i'm
>not a memory expert i'll not comment too much on it. I heard someone call it: "piece
>of crap description. Sure it can already load it while the alu's work.

ok, I was wrong in this one, I thought only one clause can start executing at same time.

>>>What's going to be FASTER in well designed gpgpu codes?
>>>
>>>0.83Ghz * 3072 PE's is *always* going to annihilate in well designed codes a meager 448 PE's @ 1.2Ghz Tesla.
>>
>>Did I say they would not? No.
>
>But you try to scave off factor 2 of the performance the cayman delivers, as they
>come in 2 gpu's at 1 card for 500 and a little euro's.

No. I clearly said these are the numbers for single chip and I assumed you can do the multiplication by two and get the numbers if you are comparing performance of two-chip card. But it seems your brains cannot do multiplication by 2 without raising "someone is doing some incorrect comparison"-interrupt.

>>I just told you are counting your cores incorrectly. And telling you are wrong seems to make me an idiot.
>
>You're comparing in the wrong manner.
>
>You must compare the number of PE's with each other as that's what the effective
>speed is of both those gpu's that they'll deliver you if you write great code for it.
>If your argumentation is: "but my code isn't optimal" that's not relevant in this discussion, see what i wrote.

My code is quite good, but code of average coder is not.

>>But if the code does not have any ILP, only parallelism between work items, then
>>3/4 or those ATI ALU's are idling. This practically means badly optimized code.
>
>We know from the well optimized codes that the better gpu coders get a higher %
>out of AMD than out of Nvidia cards pre-tesla. Tesla is a big big improvement there,
>yet even if you manage to get a slighly higher % out of it thanks to CUDA being
>a better low level language than opencl (think of adding carries fast in cuda -
>try that in opencl), that still is peanuts. If you manage to get 700 gflops out
>of that tesla, counted in a theoretic manner (counting multiply-add as 2 flops), then you're a hero.
>
>This whereas getting a practical 2 Tflop out of the 6990 is not even remote to hero status.

Not a remote hero status, but might still require things like putting many "logical work items" into one actual work item.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Predicting GPU Performance for AMD and NvidiaDavid Kanter2011/04/12 12:55 AM
  Graph is not red-green colorblind friendly (NT)RatherNotSay2011/04/12 04:51 AM
    FixedDavid Kanter2011/04/12 09:46 AM
  New Article: Predicting GPU Performance for AMD and NvidiaJames2011/04/12 01:30 PM
    New Article: Predicting GPU Performance for AMD and NvidiaDavid Kanter2011/04/12 03:51 PM
  Try HD6450 or HD6850EduardoS2011/04/12 04:31 PM
    Try HD6450 or HD6850David Kanter2011/04/13 11:25 AM
      Try HD6450 or HD6850EduardoS2011/04/13 04:20 PM
        of causeMoritz2011/04/14 09:03 AM
          of causeEduardoS2011/04/14 02:55 PM
            Barts = 5DMoritz2011/04/14 10:26 PM
              Barts = 5DAntti-Ville Tuunainen2011/04/15 01:38 AM
                Limiting fixed function unitsMoritz2011/04/15 05:28 AM
                  Limiting fixed function unitsVincent Diepeveen2011/04/20 03:38 AM
                    lack of detailMoritz2011/04/20 10:24 AM
                      lack of detailEduardoS2011/04/20 12:45 PM
            gpgpuVincent Diepeveen2011/04/16 03:10 AM
              gpgpuEduardoS2011/04/17 01:31 PM
                gpgpuGroo2011/04/17 01:58 PM
                  gpgpuEduardoS2011/04/17 02:08 PM
                  gpgpuIan Ameline2011/04/18 04:55 PM
                    gpgpuPing-Che Chen2011/04/19 01:59 AM
                      GPU numerical complianceSylvain Collange2011/04/19 12:38 PM
                        GPU numerical complianceVincent Diepeveen2011/04/20 03:17 AM
                gpgpuVincent Diepeveen2011/04/20 03:02 AM
                  gpgpu and core countsHeikki Kultala2011/04/20 05:41 AM
                    gpgpu and core countsVincent Diepeveen2011/04/20 06:52 AM
                      gpgpu and core countsnone2011/04/20 08:05 AM
                        gpgpu and core countsEduardoS2011/04/20 12:36 PM
                      gpgpu and core countsHeikki Kultala2011/04/20 11:16 AM
                        gpgpu and core countsEduardoS2011/04/20 12:34 PM
                          gpgpu and core countsHeikki Kultala2011/04/20 08:24 PM
                            gpgpu and core countsEduardoS2011/04/20 09:55 PM
                              gpgpu and core countsHeikki Kultala2011/04/21 07:48 AM
                                gpgpu and core countsEduardoS2011/04/22 02:41 PM
                              AMD Compute and Texture FetchDavid Kanter2011/04/21 11:42 AM
                                AMD Compute and Texture FetchVincent Diepeveen2011/04/22 02:14 AM
                                  AMD Compute and Texture FetchDavid Kanter2011/04/22 11:53 AM
                                AMD Compute and Texture FetchEduardoS2011/04/22 02:46 PM
                                  AMD Compute and Texture FetchDavid Kanter2011/04/22 03:02 PM
                                    AMD Compute and Texture FetchEduardoS2011/04/22 03:18 PM
                                    AMD Compute and Texture Fetchanon2011/04/22 04:30 PM
                                      AMD Compute and Texture FetchDavid Kanter2011/04/22 10:17 PM
                        gpgpu and core countsVincent Diepeveen2011/04/20 01:12 PM
                          gpgpu and core countsHeikki Kultala2011/04/21 11:23 AM
                            gpgpu and core countsVincent Diepeveen2011/04/22 03:11 AM
                              Keep the crazy politics out of thisDavid Kanter2011/04/22 09:39 AM
                                Keep the crazy politics out of thisVincent Diepeveen2011/04/22 10:12 AM
                                  Keep the crazy politics out of thisDavid Kanter2011/04/22 11:44 AM
                              gpgpu and core countsJouni Osmala2011/04/22 12:06 PM
                  gpgpuEduardoS2011/04/20 12:59 PM
                    gpgpuVincent Diepeveen2011/04/20 01:37 PM
                      gpgpuEduardoS2011/04/20 06:27 PM
                        gpgpuVincent Diepeveen2011/04/21 03:06 AM
                          gpgpuEduardoS2011/04/22 03:00 PM
  New Article: Predicting GPU Performance for AMD and NvidiaPiedPiper2011/04/12 11:05 PM
    New Article: Predicting GPU Performance for AMD and NvidiaDavid Kanter2011/04/12 11:42 PM
      New Article: Predicting GPU Performance for AMD and NvidiaMS2011/04/15 06:04 AM
        New Article: Predicting GPU Performance for AMD and NvidiaKevin G2011/04/16 03:25 AM
          New Article: Predicting GPU Performance for AMD and NvidiaDavid Kanter2011/04/16 09:42 AM
          New Article: Predicting GPU Performance for AMD and NvidiaVincent Diepeveen2011/04/20 03:20 AM
    memoryMoritz2011/04/14 10:03 PM
      memory - moreMoritz2011/04/16 12:11 AM
  New Article: Predicting GPU Performance for AMD and NvidiaKevin G2011/04/14 12:30 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊