By: Adrian (a.delete@this.acm.org), September 30, 2022 1:55 am
Room: Moderated Discussions
noko (noko.delete@this.noko.com) on September 29, 2022 8:02 pm wrote:
> https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022
>
> I guess actual details won't be released until sometime next year. But some interesting points:
>
> - SME2 just a year after SME
> - Multi-vector instructions, I'm guessing this basically RVV's vector groups?
> - ARM's 3rd try to kill ROP, this time with a shadow stack
> - A "Hybrid Vector Length Agnostic" programming model for SVE2, any clue what that means?
I have no idea about what "multi-vector operations" means, but e.g. doing a matrix-vector multiplication simultaneously with multiple vectors, in a single operation, might improve the throughput in comparison with doing separate matrix-vector multiplications.
On the other hand, "range prefetches" sounds more clearly like a very desirable addition, if they would allow the control of the destination of the prefetch, e.g. to registers or to L1 cache or to L2 cache and/or if they would also do the gathering of matrix rows/columns into a compact range. On CPUs with cache memories, and especially with multiple levels of cache memories, it is always desirable to have a more predictable and robust means to ensure that matrix blocks are located in the right level of cache memory at the right time.
> https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022
>
> I guess actual details won't be released until sometime next year. But some interesting points:
>
> - SME2 just a year after SME
> - Multi-vector instructions, I'm guessing this basically RVV's vector groups?
> - ARM's 3rd try to kill ROP, this time with a shadow stack
> - A "Hybrid Vector Length Agnostic" programming model for SVE2, any clue what that means?
I have no idea about what "multi-vector operations" means, but e.g. doing a matrix-vector multiplication simultaneously with multiple vectors, in a single operation, might improve the throughput in comparison with doing separate matrix-vector multiplications.
On the other hand, "range prefetches" sounds more clearly like a very desirable addition, if they would allow the control of the destination of the prefetch, e.g. to registers or to L1 cache or to L2 cache and/or if they would also do the gathering of matrix rows/columns into a compact range. On CPUs with cache memories, and especially with multiple levels of cache memories, it is always desirable to have a more predictable and robust means to ensure that matrix blocks are located in the right level of cache memory at the right time.
Topic | Posted By | Date |
---|---|---|
ARM announces Armv8.9 and Armv9.4 | noko | 2022/09/29 08:02 PM |
ARM announces Armv8.9 and Armv9.4 | Doug S | 2022/09/29 10:39 PM |
ARM announces Armv8.9 and Armv9.4 | Michael S | 2022/09/30 04:21 AM |
ARM announces Armv8.9 and Armv9.4 | Adrian | 2022/09/30 01:55 AM |
ARM announces Armv8.9 and Armv9.4 | Michael S | 2022/09/30 04:28 AM |
ARM announces Armv8.9 and Armv9.4 | dmcq | 2022/09/30 09:11 AM |