By: David Kanter (dkanter.delete@this.realworldtech.com), August 16, 2011 6:49 am
Room: Moderated Discussions
Mark Roulo (nothanks@xxx.com) on 8/16/11 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 8/15/11 wrote:
>---------------------------
>>Mark Roulo (nothanks@xxx.com) on 8/15/11 wrote:
>>---------------------------
>>>David Kanter (dkanter@realworldtech.com) on 8/10/11 wrote:
>>>---------------------------
>>>>AFAIK, Intel has not exposed any shared memory to SW, which is required for OpenCL.
>>>>They could use the L3 cache for shared memory, but the performance seems like it
>>>>would be pretty awful due to high latency.
>>>
>>>Shared memory is a logical, not physical, concept in a cache-coherent system.
>>>The L1 would probably wind up being used for typical OpenCL codes (the mapping being one nVidia SM -> 1 x86 core).
>>
>>I was speaking of the Sandy Bridge GPU.
>
>Oh! Whoops.
>
>>>>I also wonder about numerical accuracy.
>>>
>>>Do we expect x86 numerics to be *WORSE* than GPU numerics?
>>
>>No, but I suspect the SNB GPU may have worse numerics than Nvidia/AMD GPUs.
>
>This would not surprise me, either.
>
>>Agreed. Although I think Matt Pharr and the other folks in ART are trying to give
>>you *similar* performance, without using instrinsics.
>
>Folks are trying. I've spent time in the last few years dealing with nVidia/CUDA
>claims. We've coded for nVidia/CUDA, ATI/OpenCL and evaluated RapidMind (now Intel
>Array Building Blocks), and Intel Thread Building Blocks on x86.
>
>None of them come anywhere close to their performance claims on our loads (which,
>I grant, are not single precision floating point ...).
>
>We couldn't even get ABB to beat ICC compiling *scalar* code and auto-vectorization
>turned on, although ABB beat GCC by a few tens of percent. RapidMind failed just
>as badly a few years ago when we evaluated them and asked *them* to write the RapidMind code for us.
>
>So ... I'm skeptical about OpenCL on x86 beating hand >coded intrinsics in general any time soon.
I remember our discussions : ) BTW - are you guys going to Hot Chips this year?
You might want to try ISPC. I'd be very curious to know how it stacks up against well coded software (http://ispc.github.com/).
David
---------------------------
>David Kanter (dkanter@realworldtech.com) on 8/15/11 wrote:
>---------------------------
>>Mark Roulo (nothanks@xxx.com) on 8/15/11 wrote:
>>---------------------------
>>>David Kanter (dkanter@realworldtech.com) on 8/10/11 wrote:
>>>---------------------------
>>>>AFAIK, Intel has not exposed any shared memory to SW, which is required for OpenCL.
>>>>They could use the L3 cache for shared memory, but the performance seems like it
>>>>would be pretty awful due to high latency.
>>>
>>>Shared memory is a logical, not physical, concept in a cache-coherent system.
>>>The L1 would probably wind up being used for typical OpenCL codes (the mapping being one nVidia SM -> 1 x86 core).
>>
>>I was speaking of the Sandy Bridge GPU.
>
>Oh! Whoops.
>
>>>>I also wonder about numerical accuracy.
>>>
>>>Do we expect x86 numerics to be *WORSE* than GPU numerics?
>>
>>No, but I suspect the SNB GPU may have worse numerics than Nvidia/AMD GPUs.
>
>This would not surprise me, either.
>
>>Agreed. Although I think Matt Pharr and the other folks in ART are trying to give
>>you *similar* performance, without using instrinsics.
>
>Folks are trying. I've spent time in the last few years dealing with nVidia/CUDA
>claims. We've coded for nVidia/CUDA, ATI/OpenCL and evaluated RapidMind (now Intel
>Array Building Blocks), and Intel Thread Building Blocks on x86.
>
>None of them come anywhere close to their performance claims on our loads (which,
>I grant, are not single precision floating point ...).
>
>We couldn't even get ABB to beat ICC compiling *scalar* code and auto-vectorization
>turned on, although ABB beat GCC by a few tens of percent. RapidMind failed just
>as badly a few years ago when we evaluated them and asked *them* to write the RapidMind code for us.
>
>So ... I'm skeptical about OpenCL on x86 beating hand >coded intrinsics in general any time soon.
I remember our discussions : ) BTW - are you guys going to Hot Chips this year?
You might want to try ISPC. I'd be very curious to know how it stacks up against well coded software (http://ispc.github.com/).
David