V1 bottleneck

By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), August 9, 2022 12:38 am
Room: Moderated Discussions
Interesting indeed, thanks for sharing. I like the SVE instruction set, but its implementation in the V1 has a major bottleneck: many (in particular: predicate-modifying) instructions can only run on the M0 pipeline, so a single instruction per cycle. Hopefully this will be fixed in future uarchs.

And 256-bit vectors are very welcome, but we only get 2 IPC, whereas NEON can do 4x128 bit. The SVE advantage would currently seem to be limited to instructions that NEON lacks, but are not crippled by the M0 bottleneck. I suppose that includes SPLICE, predicated stores, and (occasional, because also M0) CNTP.

Thus it is understandable that the data shows parity between V1's SVE and NEON except in a few cases. Nice to see the improvement vs. previous compilers, though.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Interesting ARM compiler data---2022/08/08 02:54 PM
  Interesting ARM compiler datanoko2022/08/08 09:30 PM
    V1 bottleneckJan Wassenberg2022/08/09 12:38 AM
    Interesting ARM compiler data---2022/08/09 10:15 AM
      Interesting ARM compiler datanoko2022/08/09 11:34 AM
        Interesting ARM compiler dataJörn Engel2022/08/09 01:45 PM
        Interesting ARM compiler data---2022/08/09 01:49 PM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊