ARMC6 - Arm or clang ?

By: Michael S (already5chosen.delete@this.yahoo.com), June 5, 2020 7:55 am
Room: Moderated Discussions
Dan Fay (daniel.fay.delete@this.gmail.com) on June 5, 2020 8:26 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on June 5, 2020 5:49 am wrote:
> > Dan Fay (daniel.fay.delete@this.gmail.com) on June 2, 2020 9:19 am wrote:
> > > So this is what the ARM compiler is doing with an M4F (specific target is STM32F412ZG):
> > >
> > > "C++" code:
> > >
> > > float fptest(float val1, float val2, float val3) {
> > > float test = val3;
> > > test *= val1 + val2;
> > > return test;
> > > }
> > >
> > >
> > > Mbed Studio "Release" setting with ARMC6:
> > >
> > > 080073e0 :
> > > 80073e0: ee30 0a20 vadd.f32 s0, s0, s1
> > > 80073e4: ee20 0a01 vmul.f32 s0, s0, s2
> > > 80073e8: 4770 bx lr
> > >
> > >
> > > Mbed Studio "Develop" setting with ARMC6:
> > >
> >
> > Is ARMC6 compiler based on Arm Inc./Keil own compiler is
> > it a clang, possibly with different run-time library?
> >
>
> I think it's clang-based.
>
>

That explains why EULA does not contain Arm's traditional old restrictions.

My copy of clang (9.0.0) produces identical code to the one, you posted, for Cortex-M7, but something different for Cortex-M4

00000000 : # with restrict
0: ed91 1a00 vldr s2, [r1]
4: ed91 2a01 vldr s4, [r1, #4]
8: ed91 3a02 vldr s6, [r1, #8]
c: ed91 4a03 vldr s8, [r1, #12]
10: ee21 1a00 vmul.f32 s2, s2, s0
14: ee22 2a00 vmul.f32 s4, s4, s0
18: ee23 3a00 vmul.f32 s6, s6, s0
1c: ee24 0a00 vmul.f32 s0, s8, s0
20: ee31 1a20 vadd.f32 s2, s2, s1
24: ee32 2a20 vadd.f32 s4, s4, s1
28: ee33 3a20 vadd.f32 s6, s6, s1
2c: ee30 0a20 vadd.f32 s0, s0, s1
30: ed80 1a00 vstr s2, [r0]
34: ed80 2a01 vstr s4, [r0, #4]
38: ed80 3a02 vstr s6, [r0, #8]
3c: ed80 0a03 vstr s0, [r0, #12]
40: 4770 bx lr

00000042 : # without restrict
42: ed91 1a00 vldr s2, [r1]
46: ee21 1a00 vmul.f32 s2, s2, s0
4a: ee31 1a20 vadd.f32 s2, s2, s1
4e: ed80 1a00 vstr s2, [r0]
52: ed91 1a01 vldr s2, [r1, #4]
56: ee21 1a00 vmul.f32 s2, s2, s0
5a: ee31 1a20 vadd.f32 s2, s2, s1
5e: ed80 1a01 vstr s2, [r0, #4]
62: ed91 1a02 vldr s2, [r1, #8]
66: ee21 1a00 vmul.f32 s2, s2, s0
6a: ee31 1a20 vadd.f32 s2, s2, s1
6e: ed80 1a02 vstr s2, [r0, #8]
72: ed91 1a03 vldr s2, [r1, #12]
76: ee21 0a00 vmul.f32 s0, s2, s0
7a: ee30 0a20 vadd.f32 s0, s0, s1
7e: ed80 0a03 vstr s0, [r0, #12]
82: 4770 bx lr


For my real code, which does not resamble this tiny examples, I am very disappointed with both gcc and clang. They are stupid both in common ways and in different ways.
Common: they never use VLDM
Different:
gcc:
gcc doesn't align 32-bit instructions on 32-bit boundaries, even when it's very easy to do.
gcc uses vfma, unless prevented to do so by -std=c99 or by -ffp-contract=on

clang:
clang doesn't schedule dependent vmul.F32 and vadd.F32 one instruction apart. Even when it's very easy to do.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Understanding Cortex M4F instructions timingMichael S2020/06/01 10:07 AM
  Understanding Cortex M4F instructions timinganon³2020/06/01 09:26 PM
  Understanding Cortex M4F instructions timingMichael S2020/06/02 07:23 AM
  Understanding Cortex M4F instructions timingDan Fay2020/06/02 07:37 AM
    Understanding Cortex M4F instructions timingDan Fay2020/06/02 08:19 AM
      Understanding Cortex M4F instructions timingMichael S2020/06/02 08:48 AM
        Understanding Cortex M4F instructions timingMichael S2020/06/02 10:56 AM
          Understanding Cortex M4F instructions timingMichael S2020/06/02 11:07 AM
            Understanding Cortex M4F instructions timingDan Fay2020/06/02 12:22 PM
          Understanding Cortex M4F instructions timingDan Fay2020/06/02 12:08 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 12:20 PM
          Understanding Cortex M4F instructions timingWilco2020/06/02 02:02 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 02:17 PM
            Understanding Cortex M4F - VLDMMichael S2020/06/04 01:28 PM
            The goal of Cortex-M4 FPUMichael S2020/06/04 01:30 PM
              The goal of Cortex-M4 FPUDan Fay2020/06/05 07:31 AM
      ARMC6 - Arm or clang ?Michael S2020/06/05 04:49 AM
        ARMC6 - Arm or clang ?Dan Fay2020/06/05 07:26 AM
          ARMC6 - Arm or clang ?Michael S2020/06/05 07:55 AM
            M4F - few convolution benchesMichael S2020/06/11 08:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?