By: Michael S (already5chosen.delete@this.yahoo.com), September 23, 2021 6:11 am
Room: Moderated Discussions
-.- (blarg.delete@this.mailinator.com) on September 22, 2021 5:56 pm wrote:
> Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > where the vector size does not work well with other subsystems, memory subsystem in particular.
>
> Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
Is there a spec for "vector length agnostic" code generation?
If there is, does it require good performance or merely correctness?
> You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
>
> Considering the examples ARM presents for SVE, methinks that alignment
> is simply not considered to be of much importance by SVE's designers.
Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.
> Andrey (andrey.semashev.delete@this.gmail.com) on September 21, 2021 5:25 pm wrote:
> > I would think, such an implementation is unrealistic, unless the underlying memory transfer unit (e.g.
> > a cache line) is a multiple of 48 (for 384-bit vectors) or 80 (for 640-bit), or alignment is entirely
> > irrelevant (which is unrealistic in its own right). There is no sense in designing an actual hardware
> > where the vector size does not work well with other subsystems, memory subsystem in particular.
>
> Realistic hardware or not, SVE code must support such configurations to be fully spec compliant.
Is there a spec for "vector length agnostic" code generation?
If there is, does it require good performance or merely correctness?
> You can, of course, check the vector length up front, and refuse to run on widths that aren't a power
> of 2 (or perhaps choose not to bother with alignment in such cases, or fall back to something else),
> though that does go against the mantra of SVE being arbitrarily "scalable" as defined by ARM.
>
> Considering the examples ARM presents for SVE, methinks that alignment
> is simply not considered to be of much importance by SVE's designers.
Thinking about it, odd vector width is probably here in ISA for a benefit of cores, similar to those of Apple of few years ago, i.e. physically registers and EUs are 128-bit wide and there are 3 SIMD pipes. On such core, 384-bit SVE can be implemented mostly at decoder, with minimal effect on the rest of the core.