By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), June 2, 2022 11:10 pm
Room: Moderated Discussions
Eric Fink (eric.delete@this.anon.com) on June 2, 2022 12:25 am wrote:
> But it seems woefully inadequate for low-latency data-parallel operations where
> you work with very short data and mix SIMD with "regular" instructions.
> Examples include using SIMD to accelerate certain data structures (e.g.
> tree lookup or modern hash table implementations), geometry processing, text processing etc.
hm, latency when moving from the vector unit to the scalar would be a matter of the implementation. One could also say that's a problem for NEON (given its lack of pmovmskb, and most workarounds being slow). It's not immediately clear to me why RVV would be "woefully inadequate" for a properly modern hash table (i.e. 'vertical', with batched lookups of independent keys, rather than the horizontal model used by absl and Folly).
Looking at Lemire's simdjson as an example of text processing, it seems those ops would also work on RVV, no? https://github.com/simdjson/simdjson/blob/master/src/icelake/dom_parser_implementation.cpp
That said, one other annoyance we ran into is the lack of native IEEE-754 round, which has to be emulated using 5 instructions.
> But it seems woefully inadequate for low-latency data-parallel operations where
> you work with very short data and mix SIMD with "regular" instructions.
> Examples include using SIMD to accelerate certain data structures (e.g.
> tree lookup or modern hash table implementations), geometry processing, text processing etc.
hm, latency when moving from the vector unit to the scalar would be a matter of the implementation. One could also say that's a problem for NEON (given its lack of pmovmskb, and most workarounds being slow). It's not immediately clear to me why RVV would be "woefully inadequate" for a properly modern hash table (i.e. 'vertical', with batched lookups of independent keys, rather than the horizontal model used by absl and Folly).
Looking at Lemire's simdjson as an example of text processing, it seems those ops would also work on RVV, no? https://github.com/simdjson/simdjson/blob/master/src/icelake/dom_parser_implementation.cpp
That said, one other annoyance we ran into is the lack of native IEEE-754 round, which has to be emulated using 5 instructions.