By: dmcq (dmcq.delete@this.fano.co.uk), July 25, 2021 5:36 pm
Room: Moderated Discussions
Introducing the Scalable Matrix Extension for the Armv9-A Architecture
I noticed ARM have issued yet another of their planned future architecture blogs, this one is called "Scalable Matrix Extension". Same sort of idea as what Intel is implementing as far as I can see. Except there is a strange aspect in that it requires a new mode for SVE called "Streaming mode" SVE. They talk about having the new SME instructons and a significant subset of the existing SVE2 instructions. And they say that one could have a longer length for the registers in streaming and non-streaming mode. As far as I can make out in fact only very straightforward operations are included in streaming mode.
I guess that instead of being 'RISC' instructions these would have a loop dealing with widths greater than the hardware SVE register or memory or cache width. Thy've implemented something like this in the Cortex-M Helium extension, but I'd have thought they could just rely on OoO for the larger application processor If it is so they can have larger tiles in their matrix multiplicatin I'd have thought there would be other tricks that could do the job without a new mode. However I can't see they would have put in a new mode without it being very important to them. Am I missing something?
I noticed ARM have issued yet another of their planned future architecture blogs, this one is called "Scalable Matrix Extension". Same sort of idea as what Intel is implementing as far as I can see. Except there is a strange aspect in that it requires a new mode for SVE called "Streaming mode" SVE. They talk about having the new SME instructons and a significant subset of the existing SVE2 instructions. And they say that one could have a longer length for the registers in streaming and non-streaming mode. As far as I can make out in fact only very straightforward operations are included in streaming mode.
I guess that instead of being 'RISC' instructions these would have a loop dealing with widths greater than the hardware SVE register or memory or cache width. Thy've implemented something like this in the Cortex-M Helium extension, but I'd have thought they could just rely on OoO for the larger application processor If it is so they can have larger tiles in their matrix multiplicatin I'd have thought there would be other tricks that could do the job without a new mode. However I can't see they would have put in a new mode without it being very important to them. Am I missing something?
Topic | Posted By | Date |
---|---|---|
ARM Scalable Matrix Extension | dmcq | 2021/07/25 05:36 PM |
ARM Scalable Matrix Extension | Adrian | 2021/07/25 09:16 PM |
Sorry, typos | Adrian | 2021/07/25 10:32 PM |
ARM SVE Streaming Mode | Adrian | 2021/07/26 12:21 AM |
ARM SVE Streaming Mode | dmcq | 2021/07/26 04:18 AM |
ARM SVE Streaming Mode | Adrian | 2021/07/26 04:45 AM |
ARM Scalable Matrix Extension | Michael S | 2021/07/26 02:53 AM |
ARM Scalable Matrix Extension | Adrian | 2021/07/26 03:41 AM |
Inner & outer product | Adrian | 2021/07/26 03:52 AM |
ARM Scalable Matrix Extension | Rayla | 2021/07/26 05:08 AM |
ARM Scalable Matrix Extension | dmcq | 2021/07/26 05:38 AM |
ARM Scalable Matrix Extension | Doug S | 2021/07/26 11:38 AM |
ARM Scalable Matrix Extension | Brett | 2021/07/26 01:54 PM |
ARM Scalable Matrix Extension | --- | 2021/07/26 05:48 PM |
ARM Scalable Matrix Extension | dmcq | 2021/07/27 02:39 AM |
ARM Scalable Matrix Extension | Anon | 2021/07/26 06:08 AM |
ARM Scalable Matrix Extension | lkcl | 2022/07/28 03:38 PM |
ARM Scalable Matrix Extension | dmcq | 2022/07/29 02:24 PM |
ARM Scalable Matrix Extension | lkcl | 2022/07/29 03:44 PM |