By: Eric Fink (eric.delete@this.anon.com), June 2, 2022 12:25 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 31, 2022 10:00 pm wrote:
> Heikki Kultala (heikki.kultal.a.delete@this.gmail.com) on May 31, 2022 8:59 am wrote:
> > What do you mean by claiming that RVV is designed for scaling down?
> > To me, it seems overly complex. Way too much state related to the vector configuration.
> First, RVV allows the implementation to choose the vector length and requires apps to cope with
> that. That saves a good deal of area compared to 512-bit. (This is also true of SVE/SVE2.)
>
> Second, there are some features that benefit simple single-issue architectures: in particular
> LMUL>1 and chaining so that multiple execution units can still be kept busy.
>
> Third, the vector ISA is simpler (fewer instructions) than SVE and especially
> SVE2. I do wish RVV had more 2-arg permute/swizzle instructions, though.
A bit off-topic, but I am curious about your opinion. From my amateur perspective RVV seems to target a subset of HPC workflows and is more concerned about allowing high throughput (on appropriate hardware). But it seems woefully inadequate for low-latency data-parallel operations where you work with very short data and mix SIMD with "regular" instructions. Examples include using SIMD to accelerate certain data structures (e.g. tree lookup or modern hash table implementations), geometry processing, text processing etc.
> Heikki Kultala (heikki.kultal.a.delete@this.gmail.com) on May 31, 2022 8:59 am wrote:
> > What do you mean by claiming that RVV is designed for scaling down?
> > To me, it seems overly complex. Way too much state related to the vector configuration.
> First, RVV allows the implementation to choose the vector length and requires apps to cope with
> that. That saves a good deal of area compared to 512-bit. (This is also true of SVE/SVE2.)
>
> Second, there are some features that benefit simple single-issue architectures: in particular
> LMUL>1 and chaining so that multiple execution units can still be kept busy.
>
> Third, the vector ISA is simpler (fewer instructions) than SVE and especially
> SVE2. I do wish RVV had more 2-arg permute/swizzle instructions, though.
A bit off-topic, but I am curious about your opinion. From my amateur perspective RVV seems to target a subset of HPC workflows and is more concerned about allowing high throughput (on appropriate hardware). But it seems woefully inadequate for low-latency data-parallel operations where you work with very short data and mix SIMD with "regular" instructions. Examples include using SIMD to accelerate certain data structures (e.g. tree lookup or modern hash table implementations), geometry processing, text processing etc.