By: Eric Fink (eric.delete.delete@this.this.anon.com), June 3, 2022 12:23 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on June 2, 2022 11:10 pm wrote:
> hm, latency when moving from the vector unit to the scalar would be a matter of the implementation. One could
> also say that's a problem for NEON (given its lack of pmovmskb, and most workarounds being slow).
In my (admittedly very limited experience) with NEON lack of pmovmskb has been much less painful than originally expected because NEON has fast horizontal reduction (min/max/sum). I have a small library for geometry processing (things like primitive intersection/relation etc.) and I was able to quickly reformulate all the relevant algorithms, at times achieving significant reductions in the number of instructions needed compared to SSE with pmovmskb.
> It's not immediately
> clear to me why RVV would be "woefully inadequate" for a properly modern hash table (i.e. 'vertical', with batched
> lookups of independent keys, rather than the horizontal model used by absl and Folly).
But is't that again an example of optimising for throughtput rather than latency? Batching hash table lookups is hardly an option for many applications.
> Looking at Lemire's simdjson as an example of text processing, it seems those ops would also work on
> RVV, no? https://github.com/simdjson/simdjson/blob/master/src/icelake/dom_parser_implementation.cpp
You are right, text processing is probably a bad example, since text sequences are usually longish. What I primarily have in mind is leveraging SIMD to make certain basic discrete operations faster, rather then converting the entire logic to be data-parallel.
> hm, latency when moving from the vector unit to the scalar would be a matter of the implementation. One could
> also say that's a problem for NEON (given its lack of pmovmskb, and most workarounds being slow).
In my (admittedly very limited experience) with NEON lack of pmovmskb has been much less painful than originally expected because NEON has fast horizontal reduction (min/max/sum). I have a small library for geometry processing (things like primitive intersection/relation etc.) and I was able to quickly reformulate all the relevant algorithms, at times achieving significant reductions in the number of instructions needed compared to SSE with pmovmskb.
> It's not immediately
> clear to me why RVV would be "woefully inadequate" for a properly modern hash table (i.e. 'vertical', with batched
> lookups of independent keys, rather than the horizontal model used by absl and Folly).
But is't that again an example of optimising for throughtput rather than latency? Batching hash table lookups is hardly an option for many applications.
> Looking at Lemire's simdjson as an example of text processing, it seems those ops would also work on
> RVV, no? https://github.com/simdjson/simdjson/blob/master/src/icelake/dom_parser_implementation.cpp
You are right, text processing is probably a bad example, since text sequences are usually longish. What I primarily have in mind is leveraging SIMD to make certain basic discrete operations faster, rather then converting the entire logic to be data-parallel.