Article: Parallelism at HotPar 2010
By: Michael S (already5chosen.delete@this.yahoo.com), August 23, 2010 2:13 am
Room: Moderated Discussions
ajensen (@.) on 8/23/10 wrote:
---------------------------
>Anon (no@thanks.com) on 8/22/10 wrote:
>---------------------------
>>It does however look to me like Intel is playing the same game people seem incensed
>>at NVidia for doing here, unless I have missed something their base C implementation
>>is running single threaded on a single core, versus 4/8 SMD units for their OpenCL version..
>
>On page 14 they show the most interesting scenarios including handtuned C with
>MT and SSE, which is of course for this platform, shown to be faster than OpenCL.
>They don't show handtuned single thread C with SSE, which might be of some interest
>in academia, but in practice no one spends that much time for an implementation that will be slow vs. MT.
There are good reasons to prefer single-threaded SIMD over MT scalar. First, the change is more "local", other parts of the application less influenced. Second, until Nehalem you often could gain more by SIMD than by threading. Third, fewer performance surprises due to effects of cache layout.
Fourth, sometimes other parts of your application could effectively utilize the remaining cores. And finally, fifth, not everybody share the mentality of "grab as many computing resources as you can and other applications (or user on time sharing machine) can go to hell.
>
>Also they don't show "naive-C-with-MT", but that is really an oxymoron.
>
>So IMO they don't claim OpenCL to be the silver bullet for execution speed, but
>close enough to best implementation. And that much more portable and fast to write.
>In time I'm sure compilers will beat humans anyway for execution speed. It, is just
>a mater of complexity. When was the last time humans could beat computers for CPU layout?
IMHO, OpenCL came so close to hand-tuned SSE due to big equalizer in form of sqrt() in the inner loop.
---------------------------
>Anon (no@thanks.com) on 8/22/10 wrote:
>---------------------------
>>It does however look to me like Intel is playing the same game people seem incensed
>>at NVidia for doing here, unless I have missed something their base C implementation
>>is running single threaded on a single core, versus 4/8 SMD units for their OpenCL version..
>
>On page 14 they show the most interesting scenarios including handtuned C with
>MT and SSE, which is of course for this platform, shown to be faster than OpenCL.
>They don't show handtuned single thread C with SSE, which might be of some interest
>in academia, but in practice no one spends that much time for an implementation that will be slow vs. MT.
There are good reasons to prefer single-threaded SIMD over MT scalar. First, the change is more "local", other parts of the application less influenced. Second, until Nehalem you often could gain more by SIMD than by threading. Third, fewer performance surprises due to effects of cache layout.
Fourth, sometimes other parts of your application could effectively utilize the remaining cores. And finally, fifth, not everybody share the mentality of "grab as many computing resources as you can and other applications (or user on time sharing machine) can go to hell.
>
>Also they don't show "naive-C-with-MT", but that is really an oxymoron.
>
>So IMO they don't claim OpenCL to be the silver bullet for execution speed, but
>close enough to best implementation. And that much more portable and fast to write.
>In time I'm sure compilers will beat humans anyway for execution speed. It, is just
>a mater of complexity. When was the last time humans could beat computers for CPU layout?
IMHO, OpenCL came so close to hand-tuned SSE due to big equalizer in form of sqrt() in the inner loop.