By: dmcq (dmcq.delete@this.fano.co.uk), September 23, 2021 7:53 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on September 23, 2021 6:11 am wrote:
> -.- (blarg.delete@this.mailinator.com) on September 22, 2021 5:56 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > > where the vector size does not work well with other subsystems, memory subsystem in particular.
> >
> > Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
>
> Is there a spec for "vector length agnostic" code generation?
> If there is, does it require good performance or merely correctness?
>
> > You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> > of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> > though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
> >
> > Considering the examples ARM presents for SVE, methinks that alignment
> > is simply not considered to be of much importance by SVE's designers.
>
> Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple
> of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such
> core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.
>
I'd guess those odd lengths will be more use with the streaming mode SVE they announced in Introducing the Scalable Matrix Extension for the Armv9-A Architecture
The streaming version of the SVE registers could be quite a bit longer and the operations done a cache line at a time for instance. With normal SVE there probably wouldn't be much gain cutting the length down to fit the problem, but with streaming SVE it definitely would make a difference.
> -.- (blarg.delete@this.mailinator.com) on September 22, 2021 5:56 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > > where the vector size does not work well with other subsystems, memory subsystem in particular.
> >
> > Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
>
> Is there a spec for "vector length agnostic" code generation?
> If there is, does it require good performance or merely correctness?
>
> > You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> > of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> > though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
> >
> > Considering the examples ARM presents for SVE, methinks that alignment
> > is simply not considered to be of much importance by SVE's designers.
>
> Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple
> of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such
> core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.
>
I'd guess those odd lengths will be more use with the streaming mode SVE they announced in Introducing the Scalable Matrix Extension for the Armv9-A Architecture
The streaming version of the SVE registers could be quite a bit longer and the operations done a cache line at a time for instance. With normal SVE there probably wouldn't be much gain cutting the length down to fit the problem, but with streaming SVE it definitely would make a difference.