By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), August 9, 2022 12:38 am
Room: Moderated Discussions
Interesting indeed, thanks for sharing. I like the SVE instruction set, but its implementation in the V1 has a major bottleneck: many (in particular: predicate-modifying) instructions can only run on the M0 pipeline, so a single instruction per cycle. Hopefully this will be fixed in future uarchs.
And 256-bit vectors are very welcome, but we only get 2 IPC, whereas NEON can do 4x128 bit. The SVE advantage would currently seem to be limited to instructions that NEON lacks, but are not crippled by the M0 bottleneck. I suppose that includes SPLICE, predicated stores, and (occasional, because also M0) CNTP.
Thus it is understandable that the data shows parity between V1's SVE and NEON except in a few cases. Nice to see the improvement vs. previous compilers, though.
And 256-bit vectors are very welcome, but we only get 2 IPC, whereas NEON can do 4x128 bit. The SVE advantage would currently seem to be limited to instructions that NEON lacks, but are not crippled by the M0 bottleneck. I suppose that includes SPLICE, predicated stores, and (occasional, because also M0) CNTP.
Thus it is understandable that the data shows parity between V1's SVE and NEON except in a few cases. Nice to see the improvement vs. previous compilers, though.
Topic | Posted By | Date |
---|---|---|
Interesting ARM compiler data | --- | 2022/08/08 02:54 PM |
Interesting ARM compiler data | noko | 2022/08/08 09:30 PM |
V1 bottleneck | Jan Wassenberg | 2022/08/09 12:38 AM |
Interesting ARM compiler data | --- | 2022/08/09 10:15 AM |
Interesting ARM compiler data | noko | 2022/08/09 11:34 AM |
Interesting ARM compiler data | Jörn Engel | 2022/08/09 01:45 PM |
Interesting ARM compiler data | --- | 2022/08/09 01:49 PM |