Article: Parallelism at HotPar 2010
By: Steve Underwood (steveu.delete@this.coppice.org), August 23, 2010 2:35 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 8/23/10 wrote:
---------------------------
>ajensen (@.) on 8/23/10 wrote:
>---------------------------
>>Anon (no@thanks.com) on 8/22/10 wrote:
>>---------------------------
>>>It does however look to me like Intel is playing the same game people seem incensed
>>>at NVidia for doing here, unless I have missed something their base C implementation
>>>is running single threaded on a single core, versus 4/8 SMD units for their OpenCL version..
>>
>>On page 14 they show the most interesting scenarios including handtuned C with
>>MT and SSE, which is of course for this platform, shown to be faster than OpenCL.
>>They don't show handtuned single thread C with SSE, which might be of some interest
>>in academia, but in practice no one spends that much time for an implementation that will be slow vs. MT.
>
>There are good reasons to prefer single-threaded SIMD over MT scalar. First, the
>change is more "local", other parts of the application less influenced. Second,
>until Nehalem you often could gain more by SIMD than by threading. Third, fewer
>performance surprises due to effects of cache layout.
>Fourth, sometimes other parts of your application could effectively utilize the
>remaining cores. And finally, fifth, not everybody share the mentality of "grab
>as many computing resources as you can and other applications (or user on time sharing machine) can go to hell.
Very true. Applications like media manipulation in conferencing are crying out for more compute power. However, there are numerous separate data streams in these applications. Letting the cores process separate streams makes far more sense than trying to merge them all into a single compute resource. What I want from OpenCL on a CPU is for it to get the most out of the SIMD capabilities of a single core, in a portable manner. I have plenty of separate concurrent uses for lots of cores.
Steve
---------------------------
>ajensen (@.) on 8/23/10 wrote:
>---------------------------
>>Anon (no@thanks.com) on 8/22/10 wrote:
>>---------------------------
>>>It does however look to me like Intel is playing the same game people seem incensed
>>>at NVidia for doing here, unless I have missed something their base C implementation
>>>is running single threaded on a single core, versus 4/8 SMD units for their OpenCL version..
>>
>>On page 14 they show the most interesting scenarios including handtuned C with
>>MT and SSE, which is of course for this platform, shown to be faster than OpenCL.
>>They don't show handtuned single thread C with SSE, which might be of some interest
>>in academia, but in practice no one spends that much time for an implementation that will be slow vs. MT.
>
>There are good reasons to prefer single-threaded SIMD over MT scalar. First, the
>change is more "local", other parts of the application less influenced. Second,
>until Nehalem you often could gain more by SIMD than by threading. Third, fewer
>performance surprises due to effects of cache layout.
>Fourth, sometimes other parts of your application could effectively utilize the
>remaining cores. And finally, fifth, not everybody share the mentality of "grab
>as many computing resources as you can and other applications (or user on time sharing machine) can go to hell.
Very true. Applications like media manipulation in conferencing are crying out for more compute power. However, there are numerous separate data streams in these applications. Letting the cores process separate streams makes far more sense than trying to merge them all into a single compute resource. What I want from OpenCL on a CPU is for it to get the most out of the SIMD capabilities of a single core, in a portable manner. I have plenty of separate concurrent uses for lots of cores.
Steve