By: Vincent Diepeveen (diep.delete@this.xs4all.nl), April 22, 2011 2:14 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 4/21/11 wrote:
---------------------------
>EduardoS (no@spam.com) on 4/20/11 wrote:
>---------------------------
>>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>>---------------------------
>>>Texture fetches consume the execution slots of the >instruction word, ALU operations
>cannot be started at same >cycle.
>>
>>No they don't, they are even on different clauses, the only >memory-like operation
>>that consumes ALU slots is the LDS load/store, starting with >Evergreen, on R700 it was executed on TMUs too.
>
>My understanding (http://www.realworldtech.com/page.cfm?ArticleID=RWT121410213827&p=7)
>is that on Cayman and Cypress, the ALUs are used for address calculations. So you
>cannot simultaneously execute ALU clauses and initiate a texture fetch. However,
>initiating a texture fetch is fairly quick - most of the time is spent waiting for
>data. While you are waiting for data, the ALUs are free for independent computations.
>
>David
Texture memory is not the adviced method to get things done as texture memory is not so fast (though faster than main memory).
In the first place you need to split up your software in wavefronts that realistically only calculate within the compute units and don't use any resources outside of it.
Please note your artice is one of the few on the internet which describes the Cayman architecture a bit.
The interesting thing to know obviously now is when AMD has managed to fully improve the opencl compiler to support this new architecture pretty well.
Nvidia also seems to struggle supporting OpenCL well. This where OpenCL really seems like an interesting thing.
As for AMD gpu's, only opencl will get kept supported by AMD for their GPU's, so there are not really choices there.
Beforehand OpenCL doesn't really seem like the perfect language yet, as a big droop for some will be that the current opencl 1.1 specs give 25% of the RAM as the maximum object size, which has the implication you can allocate only 25% for that whereas there will be several applications that want from this tiny amount of RAM of course everything.
Yet it is a big step forward if you think about it how the entire HPC world will be supporting OpenCL and how also the other manufacturers will in the end be forced to produce hardware that uses the manycore concept, as this is seemingly (to hardware laymen like me) the only concept that will give enough crunching power at a cheap price in the close future.
Of course the compiler quality will be very important then.
There is a lot to win there, for example having logics in the compiler to recognize whether the programmer is trying to use the actual carry, caused for example by an overflow adding 2 (unsigned) integers.
Vincent
---------------------------
>EduardoS (no@spam.com) on 4/20/11 wrote:
>---------------------------
>>Heikki Kultala (hkultala@iki.NOSPAM.fi) on 4/20/11 wrote:
>>---------------------------
>>>Texture fetches consume the execution slots of the >instruction word, ALU operations
>cannot be started at same >cycle.
>>
>>No they don't, they are even on different clauses, the only >memory-like operation
>>that consumes ALU slots is the LDS load/store, starting with >Evergreen, on R700 it was executed on TMUs too.
>
>My understanding (http://www.realworldtech.com/page.cfm?ArticleID=RWT121410213827&p=7)
>is that on Cayman and Cypress, the ALUs are used for address calculations. So you
>cannot simultaneously execute ALU clauses and initiate a texture fetch. However,
>initiating a texture fetch is fairly quick - most of the time is spent waiting for
>data. While you are waiting for data, the ALUs are free for independent computations.
>
>David
Texture memory is not the adviced method to get things done as texture memory is not so fast (though faster than main memory).
In the first place you need to split up your software in wavefronts that realistically only calculate within the compute units and don't use any resources outside of it.
Please note your artice is one of the few on the internet which describes the Cayman architecture a bit.
The interesting thing to know obviously now is when AMD has managed to fully improve the opencl compiler to support this new architecture pretty well.
Nvidia also seems to struggle supporting OpenCL well. This where OpenCL really seems like an interesting thing.
As for AMD gpu's, only opencl will get kept supported by AMD for their GPU's, so there are not really choices there.
Beforehand OpenCL doesn't really seem like the perfect language yet, as a big droop for some will be that the current opencl 1.1 specs give 25% of the RAM as the maximum object size, which has the implication you can allocate only 25% for that whereas there will be several applications that want from this tiny amount of RAM of course everything.
Yet it is a big step forward if you think about it how the entire HPC world will be supporting OpenCL and how also the other manufacturers will in the end be forced to produce hardware that uses the manycore concept, as this is seemingly (to hardware laymen like me) the only concept that will give enough crunching power at a cheap price in the close future.
Of course the compiler quality will be very important then.
There is a lot to win there, for example having logics in the compiler to recognize whether the programmer is trying to use the actual carry, caused for example by an overflow adding 2 (unsigned) integers.
Vincent
Topic | Posted By | Date |
---|---|---|
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 12:55 AM |
Graph is not red-green colorblind friendly (NT) | RatherNotSay | 2011/04/12 04:51 AM |
Fixed | David Kanter | 2011/04/12 09:46 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | James | 2011/04/12 01:30 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 03:51 PM |
Try HD6450 or HD6850 | EduardoS | 2011/04/12 04:31 PM |
Try HD6450 or HD6850 | David Kanter | 2011/04/13 11:25 AM |
Try HD6450 or HD6850 | EduardoS | 2011/04/13 04:20 PM |
of cause | Moritz | 2011/04/14 09:03 AM |
of cause | EduardoS | 2011/04/14 02:55 PM |
Barts = 5D | Moritz | 2011/04/14 10:26 PM |
Barts = 5D | Antti-Ville Tuunainen | 2011/04/15 01:38 AM |
Limiting fixed function units | Moritz | 2011/04/15 05:28 AM |
Limiting fixed function units | Vincent Diepeveen | 2011/04/20 03:38 AM |
lack of detail | Moritz | 2011/04/20 10:24 AM |
lack of detail | EduardoS | 2011/04/20 12:45 PM |
gpgpu | Vincent Diepeveen | 2011/04/16 03:10 AM |
gpgpu | EduardoS | 2011/04/17 01:31 PM |
gpgpu | Groo | 2011/04/17 01:58 PM |
gpgpu | EduardoS | 2011/04/17 02:08 PM |
gpgpu | Ian Ameline | 2011/04/18 04:55 PM |
gpgpu | Ping-Che Chen | 2011/04/19 01:59 AM |
GPU numerical compliance | Sylvain Collange | 2011/04/19 12:38 PM |
GPU numerical compliance | Vincent Diepeveen | 2011/04/20 03:17 AM |
gpgpu | Vincent Diepeveen | 2011/04/20 03:02 AM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 05:41 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 06:52 AM |
gpgpu and core counts | none | 2011/04/20 08:05 AM |
gpgpu and core counts | EduardoS | 2011/04/20 12:36 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 11:16 AM |
gpgpu and core counts | EduardoS | 2011/04/20 12:34 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/20 08:24 PM |
gpgpu and core counts | EduardoS | 2011/04/20 09:55 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 07:48 AM |
gpgpu and core counts | EduardoS | 2011/04/22 02:41 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/21 11:42 AM |
AMD Compute and Texture Fetch | Vincent Diepeveen | 2011/04/22 02:14 AM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 11:53 AM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 02:46 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 03:02 PM |
AMD Compute and Texture Fetch | EduardoS | 2011/04/22 03:18 PM |
AMD Compute and Texture Fetch | anon | 2011/04/22 04:30 PM |
AMD Compute and Texture Fetch | David Kanter | 2011/04/22 10:17 PM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/20 01:12 PM |
gpgpu and core counts | Heikki Kultala | 2011/04/21 11:23 AM |
gpgpu and core counts | Vincent Diepeveen | 2011/04/22 03:11 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 09:39 AM |
Keep the crazy politics out of this | Vincent Diepeveen | 2011/04/22 10:12 AM |
Keep the crazy politics out of this | David Kanter | 2011/04/22 11:44 AM |
gpgpu and core counts | Jouni Osmala | 2011/04/22 12:06 PM |
gpgpu | EduardoS | 2011/04/20 12:59 PM |
gpgpu | Vincent Diepeveen | 2011/04/20 01:37 PM |
gpgpu | EduardoS | 2011/04/20 06:27 PM |
gpgpu | Vincent Diepeveen | 2011/04/21 03:06 AM |
gpgpu | EduardoS | 2011/04/22 03:00 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | PiedPiper | 2011/04/12 11:05 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/12 11:42 PM |
New Article: Predicting GPU Performance for AMD and Nvidia | MS | 2011/04/15 06:04 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/16 03:25 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | David Kanter | 2011/04/16 09:42 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Vincent Diepeveen | 2011/04/20 03:20 AM |
memory | Moritz | 2011/04/14 10:03 PM |
memory - more | Moritz | 2011/04/16 12:11 AM |
New Article: Predicting GPU Performance for AMD and Nvidia | Kevin G | 2011/04/14 12:30 PM |