By: hobold (hobold.delete@this.vectorizer.org), November 1, 2020 8:22 am
Room: Moderated Discussions
Jukka Larja (roskakori2006.delete@this.gmail.com) on October 31, 2020 8:14 am wrote:
[...]
> I'm not sure what std::simd would include, but at least on a quick glance I don't see GCC's extensions
> would matter at all for us. It's not like writing a wrapper of our own (for (S)SSE(2/3) and Neon)
> was all that difficult or took a lot of time.
Template metaprogramming is not rocket surgery. It's not trivial either, though. And you have to keep validating that new compiler versions don't break performance.
I was thinking of the contrast illustrated, for example, here between a plain formula on the one hand, and a mess of nested intrinsics on the other hand:
https://nullprogram.com/blog/2015/07/10/
The GCC extension allows writing a plain formula, too, when targeting a SIMD backend.
> The problem is that we don't have much code that
> could make use of such abstraction (actually, we mostly use four wide float vectors as direct
> replacement of three wide float vectors. We can't even make use of the last item).
>
I think the terminology "vector" for SIMD parallelism is misleading people at large into thinking in terms of vector math. There is some overlap, but as you noticed, the mathy vectors aren't generally too useful for extracting SIMD parallelism.
Usually the problem needs to be "rotated by 90 degrees", i.e. in this case you'd have a SIMD vector with elements x1, x2, x3, x4, another SIMD vector with elements y1, y2, y3, y4, and so on. Effectively working with four mathy 3-vectors inside three SIMD 4-vectors.
> I'm sure there are some places where rethinking data structures or algorithms could
> allow making use of SIMD, but if we were to go through the trouble, we'd consider
> using GPU first. After that consideration, there's practically nothing left.
>
Plus, GPU programming interfaces don't mislead you as much, as they don't really make their SIMD width visible to the programmer.
BTW, I found it useful to think about re-arranging data structures not in terms of vectors, but in terms of cache friendliness. Locality, yes, but also things like: invariant data should not be interleaved with changing data (to save write back bandwidth).
That kind of data centric programming enables optimizations which also help ordinary scalar code. And they are a large step towards SIMD, where the programming model forces locality by treating a number of subsequent data items as one indivisible unit.
Doing optimizations for caches "naturally" tends to prefer structure of arrays format.
[...]
> I'm not sure what std::simd would include, but at least on a quick glance I don't see GCC's extensions
> would matter at all for us. It's not like writing a wrapper of our own (for (S)SSE(2/3) and Neon)
> was all that difficult or took a lot of time.
Template metaprogramming is not rocket surgery. It's not trivial either, though. And you have to keep validating that new compiler versions don't break performance.
I was thinking of the contrast illustrated, for example, here between a plain formula on the one hand, and a mess of nested intrinsics on the other hand:
https://nullprogram.com/blog/2015/07/10/
The GCC extension allows writing a plain formula, too, when targeting a SIMD backend.
> The problem is that we don't have much code that
> could make use of such abstraction (actually, we mostly use four wide float vectors as direct
> replacement of three wide float vectors. We can't even make use of the last item).
>
I think the terminology "vector" for SIMD parallelism is misleading people at large into thinking in terms of vector math. There is some overlap, but as you noticed, the mathy vectors aren't generally too useful for extracting SIMD parallelism.
Usually the problem needs to be "rotated by 90 degrees", i.e. in this case you'd have a SIMD vector with elements x1, x2, x3, x4, another SIMD vector with elements y1, y2, y3, y4, and so on. Effectively working with four mathy 3-vectors inside three SIMD 4-vectors.
> I'm sure there are some places where rethinking data structures or algorithms could
> allow making use of SIMD, but if we were to go through the trouble, we'd consider
> using GPU first. After that consideration, there's practically nothing left.
>
Plus, GPU programming interfaces don't mislead you as much, as they don't really make their SIMD width visible to the programmer.
BTW, I found it useful to think about re-arranging data structures not in terms of vectors, but in terms of cache friendliness. Locality, yes, but also things like: invariant data should not be interleaved with changing data (to save write back bandwidth).
That kind of data centric programming enables optimizations which also help ordinary scalar code. And they are a large step towards SIMD, where the programming model forces locality by treating a number of subsequent data items as one indivisible unit.
Doing optimizations for caches "naturally" tends to prefer structure of arrays format.
Topic | Posted By | Date |
---|---|---|
Expiry of x86-64 patents | Beastian | 2019/04/19 09:05 AM |
Expiry of x86-64 patents | Gian-Carlo Pascutto | 2019/04/19 09:46 AM |
Expiry of x86-64 patents | Beastian | 2019/04/19 10:06 AM |
Expiry of x86-64 patents | Jukka Larja | 2019/04/19 10:44 AM |
Expiry of x86-64 patents | Gian-Carlo Pascutto | 2019/04/19 11:12 AM |
Expiry of x86-64 patents | Jukka Larja | 2019/04/19 12:41 PM |
Expiry of x86-64 patents | Robert Williams | 2019/04/19 01:18 PM |
Expiry of x86-64 patents | Gian-Carlo Pascutto | 2019/04/19 02:35 PM |
Expiry of x86-64 patents | IntelUser2000 | 2020/10/30 02:17 AM |
Expiry of x86-64 patents | Jukka Larja | 2020/10/30 07:49 AM |
Expiry of x86-64 patents | me | 2020/10/30 09:47 AM |
Expiry of x86-64 patents | Jukka Larja | 2020/10/30 09:52 AM |
Expiry of x86-64 patents | Mark Roulo | 2020/10/30 10:21 AM |
Expiry of x86-64 patents | Jukka Larja | 2020/10/30 11:29 AM |
Expiry of x86-64 patents | Mark Roulo | 2020/10/30 11:42 AM |
Expiry of x86-64 patents | Jukka Larja | 2020/10/30 09:04 PM |
SIMD syntax | hobold | 2020/10/31 06:54 AM |
SIMD syntax | Jukka Larja | 2020/10/31 09:14 AM |
SIMD syntax | hobold | 2020/11/01 08:22 AM |
SIMD syntax | Jukka Larja | 2020/11/01 11:11 AM |
SIMD syntax | hobold | 2020/11/02 05:33 AM |
Expiry of x86-64 patents | me | 2020/10/31 03:01 PM |
Expiry of x86-64 patents | Jukka Larja | 2020/10/31 09:23 PM |
Expiry of x86-64 patents | Foo_ | 2020/11/01 04:48 AM |
Expiry of x86-64 patents | Jukka Larja | 2020/11/01 07:01 AM |
Expiry of x86-64 patents | Adrian | 2020/10/30 12:02 PM |
Expiry of x86-64 patents | Bigos | 2020/10/30 01:20 PM |
Expiry of x86-64 patents | Geoff Langdale | 2019/04/19 02:52 PM |
Expiry of x86-64 patents | Jukka Larja | 2019/04/19 09:38 PM |
Expiry of x86-64 patents | Yuhong Bao | 2019/04/20 02:35 PM |
Expiry of x86-64 patents | Doug S | 2019/04/19 10:40 AM |
Expiry of x86-64 patents | Beastian | 2019/04/19 11:10 AM |
Expiry of x86-64 patents | Robert Williams | 2019/04/20 08:15 AM |
Expiry of x86-64 patents | Robert Williams | 2020/10/28 06:42 AM |
Expiry of x86-64 patents | anyone | 2019/04/20 07:11 AM |
Expiry of x86-64 patents | Groo | 2019/04/20 07:29 AM |
Expiry of x86-64 patents | wumpus | 2019/04/20 08:32 AM |
Expiry of x86-64 patents | blaine | 2020/10/30 12:03 PM |
Expiry of x86-64 patents | David Kanter | 2020/10/30 08:59 PM |
Intel vs AMD patents | Yuhong Bao | 2019/04/20 02:32 PM |
Intel vs AMD patents | Beastian | 2019/04/20 03:35 PM |
Expiry of x86-64 patents | Travis Downs | 2019/04/20 07:24 PM |
Expiry of x86-64 patents | none | 2019/04/21 07:36 AM |
Expiry of x86-64 patents | somebody | 2019/11/27 10:44 AM |
Expiry of x86-64 patents | Anon3 | 2019/11/27 05:16 PM |
Expiry of x86-64 patents | Travis Downs | 2019/11/27 06:17 PM |
Expiry of x86-64 patents | Montaray Jack | 2019/11/28 12:03 AM |
Expiry of x86-64 patents | none | 2019/11/28 01:57 AM |
Expiry of x86-64 patents | dmcq | 2019/11/28 11:20 AM |
Expiry of x86-64 patents | Montaray Jack | 2019/11/29 05:00 AM |