Sorry, typos

By: Adrian (, July 25, 2021 10:32 pm
Room: Moderated Discussions
Adrian ( on July 25, 2021 9:16 pm wrote:
> dmcq ( on July 25, 2021 5:36 pm wrote:
> > Introducing the Scalable Matrix Extension for the Armv9-A Architecture
> >
> > I noticed ARM have issued yet another of their planned future
> > architecture blogs, this one is called "Scalable
> > Matrix Extension". Same sort of idea as what Intel is implementing as far as I can see. Except there is
> > a strange aspect in that it requires a new mode for SVE called "Streaming mode" SVE. They talk about having
> > the new SME instructons and a significant subset of the existing SVE2 instructions. And they say that
> > one could have a longer length for the registers in streaming and non-streaming mode. As far as I can
> > make out in fact only very straightforward operations are included in streaming mode.
> >
> > I guess that instead of being 'RISC' instructions these would have a loop dealing with widths greater
> > than the hardware SVE register or memory or cache width. Thy've implemented something like this in
> > the Cortex-M Helium extension, but I'd have thought they could just rely on OoO for the larger application
> > processor If it is so they can have larger tiles in their matrix multiplicatin I'd have thought there
> > would be other tricks that could do the job without a new mode. However I can't see they would have
> > put in a new mode without it being very important to them. Am I missing something?
> You are probably right about the necessity of looping in certain cases.
> They explain clearly enough why a streaming SVE mode is needed, to be able to present to
> software an apparent vector register width that is larger than the width of the ALU's.
> This "streaming" mode is actually exactly like the traditional vector computers have operated. For
> example a Cray-1 had an apparent vector register width of 1024 bits, but the vector operations were
> computed by a 64-bit pipelined ALU, in multiple clock cycles, i.e. "looping", like you say.
> They also explain clearly enough why extra matrix instructions are useful. The throughput
> of a computational program operating on large data structures on a modern CPU is normally
> limited by the memory throughput, usually by the memory load throughput.
> For many problems you can reach easily the memory throughput on any CPU and
> there is nothing that can be done to improve the performance above that.
> However the problems that can be solved using matrix-matrix operations are an exception, because for large
> matrices the ration between arithmetic operation and memory loads can become arbitrarily large, so increasing
> the number of arithmetic operations that can be done per memory load can increase the performance.
> The only way forward to greatly increase the number of arithmetic operations
> per memory load is to add matrix instructions, a.k.a. tensor instructions.
> The instructions added by Intel in Sapphire Rapids have a very limited usefulness. Due to
> their low precision they can be used only for Machine Learning. The ARM extension appears
> to be much more general purpose, so it should be also useful for other applications.
> Unfortunately it is not clear how many years will pass until the
> introduction of ARM cores implementing this ISA extension.

Unfortunately, I have pressed "Post" without rereading the message, and here there is no way to edit it.

There are a few typos, but the most important is that I have meant to say that Cray-1 had an apparent vector register width of 4096 bit (= 64 * 64), not 1024 bit.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM Scalable Matrix Extensiondmcq2021/07/25 05:36 PM
  ARM Scalable Matrix ExtensionAdrian2021/07/25 09:16 PM
    Sorry, typosAdrian2021/07/25 10:32 PM
    ARM SVE Streaming ModeAdrian2021/07/26 12:21 AM
      ARM SVE Streaming Modedmcq2021/07/26 04:18 AM
        ARM SVE Streaming ModeAdrian2021/07/26 04:45 AM
    ARM Scalable Matrix ExtensionMichael S2021/07/26 02:53 AM
      ARM Scalable Matrix ExtensionAdrian2021/07/26 03:41 AM
        Inner & outer productAdrian2021/07/26 03:52 AM
      ARM Scalable Matrix ExtensionRayla2021/07/26 05:08 AM
      ARM Scalable Matrix Extensiondmcq2021/07/26 05:38 AM
        ARM Scalable Matrix ExtensionDoug S2021/07/26 11:38 AM
          ARM Scalable Matrix ExtensionBrett2021/07/26 01:54 PM
            ARM Scalable Matrix Extension---2021/07/26 05:48 PM
              ARM Scalable Matrix Extensiondmcq2021/07/27 02:39 AM
      ARM Scalable Matrix ExtensionAnon2021/07/26 06:08 AM
    ARM Scalable Matrix Extensionlkcl2022/07/28 03:38 PM
      ARM Scalable Matrix Extensiondmcq2022/07/29 02:24 PM
        ARM Scalable Matrix Extensionlkcl2022/07/29 03:44 PM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊