Article: Parallelism at HotPar 2010
By: hobold (hobold.delete@this.vectorizer.org), August 19, 2010 3:58 am
Room: Moderated Discussions
Steve Underwood (steveu@coppice.org) on 8/18/10 wrote:
---------------------------
[...]
>It took a lot of time for people to get the best out of hand shuffling
>things with SSSE3, and the next generation core made this complexity something that
>needs to be ripped out of the code. AAAHHHHH!
>
http://www.khronos.org/developers/library/2010_siggraph_bof_opencl/OpenCL-BOF-Intel-SIGGRAPH-Jul10.pdf
Granted, this approach will not give you the equivalent of fully hand tuned performance, but it can come reasonably close. And better yet, it abstracts from the gazillion existing SSE variants and the upcoming AVX and Larrabee ISA variations. I wonder if that will end up being a more significant advantage than the abstraction from completely different hardware types by different vendors.
---------------------------
[...]
>It took a lot of time for people to get the best out of hand shuffling
>things with SSSE3, and the next generation core made this complexity something that
>needs to be ripped out of the code. AAAHHHHH!
>
http://www.khronos.org/developers/library/2010_siggraph_bof_opencl/OpenCL-BOF-Intel-SIGGRAPH-Jul10.pdf
Granted, this approach will not give you the equivalent of fully hand tuned performance, but it can come reasonably close. And better yet, it abstracts from the gazillion existing SSE variants and the upcoming AVX and Larrabee ISA variations. I wonder if that will end up being a more significant advantage than the abstraction from completely different hardware types by different vendors.