By: Andrey (andrey.semashev.delete@this.gmail.com), September 23, 2021 10:20 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on September 23, 2021 6:11 am wrote:
> -.- (blarg.delete@this.mailinator.com) on September 22, 2021 5:56 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > > where the vector size does not work well with other subsystems, memory subsystem in particular.
> >
> > Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
>
> Is there a spec for "vector length agnostic" code generation?
> If there is, does it require good performance or merely correctness?
>
> > You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> > of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> > though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
> >
> > Considering the examples ARM presents for SVE, methinks that alignment
> > is simply not considered to be of much importance by SVE's designers.
>
> Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple
> of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such
> core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.
I would expect the native vector size to be still indicated as 128-bit in this case. The hardware would just leverage the internal parallelism by executing 3 loop iterations at a time.
Having the vector size a power of two is too convenient when all your memory unit sizes and alignments - cache line size, page size, integer/FP number sizes and alignments - are powers of two. Having an odd vector size could be useful in some special hardware, e.g. in GPUs, but not in a general purpose extension such as SVE.
> -.- (blarg.delete@this.mailinator.com) on September 22, 2021 5:56 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > > where the vector size does not work well with other subsystems, memory subsystem in particular.
> >
> > Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
>
> Is there a spec for "vector length agnostic" code generation?
> If there is, does it require good performance or merely correctness?
>
> > You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> > of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> > though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
> >
> > Considering the examples ARM presents for SVE, methinks that alignment
> > is simply not considered to be of much importance by SVE's designers.
>
> Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple
> of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such
> core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.
I would expect the native vector size to be still indicated as 128-bit in this case. The hardware would just leverage the internal parallelism by executing 3 loop iterations at a time.
Having the vector size a power of two is too convenient when all your memory unit sizes and alignments - cache line size, page size, integer/FP number sizes and alignments - are powers of two. Having an odd vector size could be useful in some special hardware, e.g. in GPUs, but not in a general purpose extension such as SVE.