By: Michael S (already5chosen.delete@this.yahoo.com), June 5, 2020 7:55 am
Room: Moderated Discussions
Dan Fay (daniel.fay.delete@this.gmail.com) on June 5, 2020 8:26 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on June 5, 2020 5:49 am wrote:
> > Dan Fay (daniel.fay.delete@this.gmail.com) on June 2, 2020 9:19 am wrote:
> > > So this is what the ARM compiler is doing with an M4F (specific target is STM32F412ZG):
> > >
> > > "C++" code:
> > >
> > > float fptest(float val1, float val2, float val3) {
> > > float test = val3;
> > > test *= val1 + val2;
> > > return test;
> > > }
> > >
> > >
> > > Mbed Studio "Release" setting with ARMC6:
> > >
> > > 080073e0 :
> > > 80073e0: ee30 0a20 vadd.f32 s0, s0, s1
> > > 80073e4: ee20 0a01 vmul.f32 s0, s0, s2
> > > 80073e8: 4770 bx lr
> > >
> > >
> > > Mbed Studio "Develop" setting with ARMC6:
> > >
> >
> > Is ARMC6 compiler based on Arm Inc./Keil own compiler is
> > it a clang, possibly with different run-time library?
> >
>
> I think it's clang-based.
>
>
That explains why EULA does not contain Arm's traditional old restrictions.
My copy of clang (9.0.0) produces identical code to the one, you posted, for Cortex-M7, but something different for Cortex-M4
00000000 : # with restrict
0: ed91 1a00 vldr s2, [r1]
4: ed91 2a01 vldr s4, [r1, #4]
8: ed91 3a02 vldr s6, [r1, #8]
c: ed91 4a03 vldr s8, [r1, #12]
10: ee21 1a00 vmul.f32 s2, s2, s0
14: ee22 2a00 vmul.f32 s4, s4, s0
18: ee23 3a00 vmul.f32 s6, s6, s0
1c: ee24 0a00 vmul.f32 s0, s8, s0
20: ee31 1a20 vadd.f32 s2, s2, s1
24: ee32 2a20 vadd.f32 s4, s4, s1
28: ee33 3a20 vadd.f32 s6, s6, s1
2c: ee30 0a20 vadd.f32 s0, s0, s1
30: ed80 1a00 vstr s2, [r0]
34: ed80 2a01 vstr s4, [r0, #4]
38: ed80 3a02 vstr s6, [r0, #8]
3c: ed80 0a03 vstr s0, [r0, #12]
40: 4770 bx lr
00000042 : # without restrict
42: ed91 1a00 vldr s2, [r1]
46: ee21 1a00 vmul.f32 s2, s2, s0
4a: ee31 1a20 vadd.f32 s2, s2, s1
4e: ed80 1a00 vstr s2, [r0]
52: ed91 1a01 vldr s2, [r1, #4]
56: ee21 1a00 vmul.f32 s2, s2, s0
5a: ee31 1a20 vadd.f32 s2, s2, s1
5e: ed80 1a01 vstr s2, [r0, #4]
62: ed91 1a02 vldr s2, [r1, #8]
66: ee21 1a00 vmul.f32 s2, s2, s0
6a: ee31 1a20 vadd.f32 s2, s2, s1
6e: ed80 1a02 vstr s2, [r0, #8]
72: ed91 1a03 vldr s2, [r1, #12]
76: ee21 0a00 vmul.f32 s0, s2, s0
7a: ee30 0a20 vadd.f32 s0, s0, s1
7e: ed80 0a03 vstr s0, [r0, #12]
82: 4770 bx lr
For my real code, which does not resamble this tiny examples, I am very disappointed with both gcc and clang. They are stupid both in common ways and in different ways.
Common: they never use VLDM
Different:
gcc:
gcc doesn't align 32-bit instructions on 32-bit boundaries, even when it's very easy to do.
gcc uses vfma, unless prevented to do so by -std=c99 or by -ffp-contract=on
clang:
clang doesn't schedule dependent vmul.F32 and vadd.F32 one instruction apart. Even when it's very easy to do.
> Michael S (already5chosen.delete@this.yahoo.com) on June 5, 2020 5:49 am wrote:
> > Dan Fay (daniel.fay.delete@this.gmail.com) on June 2, 2020 9:19 am wrote:
> > > So this is what the ARM compiler is doing with an M4F (specific target is STM32F412ZG):
> > >
> > > "C++" code:
> > >
> > > float fptest(float val1, float val2, float val3) {
> > > float test = val3;
> > > test *= val1 + val2;
> > > return test;
> > > }
> > >
> > >
> > > Mbed Studio "Release" setting with ARMC6:
> > >
> > > 080073e0 :
> > > 80073e0: ee30 0a20 vadd.f32 s0, s0, s1
> > > 80073e4: ee20 0a01 vmul.f32 s0, s0, s2
> > > 80073e8: 4770 bx lr
> > >
> > >
> > > Mbed Studio "Develop" setting with ARMC6:
> > >
> >
> > Is ARMC6 compiler based on Arm Inc./Keil own compiler is
> > it a clang, possibly with different run-time library?
> >
>
> I think it's clang-based.
>
>
That explains why EULA does not contain Arm's traditional old restrictions.
My copy of clang (9.0.0) produces identical code to the one, you posted, for Cortex-M7, but something different for Cortex-M4
00000000 : # with restrict
0: ed91 1a00 vldr s2, [r1]
4: ed91 2a01 vldr s4, [r1, #4]
8: ed91 3a02 vldr s6, [r1, #8]
c: ed91 4a03 vldr s8, [r1, #12]
10: ee21 1a00 vmul.f32 s2, s2, s0
14: ee22 2a00 vmul.f32 s4, s4, s0
18: ee23 3a00 vmul.f32 s6, s6, s0
1c: ee24 0a00 vmul.f32 s0, s8, s0
20: ee31 1a20 vadd.f32 s2, s2, s1
24: ee32 2a20 vadd.f32 s4, s4, s1
28: ee33 3a20 vadd.f32 s6, s6, s1
2c: ee30 0a20 vadd.f32 s0, s0, s1
30: ed80 1a00 vstr s2, [r0]
34: ed80 2a01 vstr s4, [r0, #4]
38: ed80 3a02 vstr s6, [r0, #8]
3c: ed80 0a03 vstr s0, [r0, #12]
40: 4770 bx lr
00000042 : # without restrict
42: ed91 1a00 vldr s2, [r1]
46: ee21 1a00 vmul.f32 s2, s2, s0
4a: ee31 1a20 vadd.f32 s2, s2, s1
4e: ed80 1a00 vstr s2, [r0]
52: ed91 1a01 vldr s2, [r1, #4]
56: ee21 1a00 vmul.f32 s2, s2, s0
5a: ee31 1a20 vadd.f32 s2, s2, s1
5e: ed80 1a01 vstr s2, [r0, #4]
62: ed91 1a02 vldr s2, [r1, #8]
66: ee21 1a00 vmul.f32 s2, s2, s0
6a: ee31 1a20 vadd.f32 s2, s2, s1
6e: ed80 1a02 vstr s2, [r0, #8]
72: ed91 1a03 vldr s2, [r1, #12]
76: ee21 0a00 vmul.f32 s0, s2, s0
7a: ee30 0a20 vadd.f32 s0, s0, s1
7e: ed80 0a03 vstr s0, [r0, #12]
82: 4770 bx lr
For my real code, which does not resamble this tiny examples, I am very disappointed with both gcc and clang. They are stupid both in common ways and in different ways.
Common: they never use VLDM
Different:
gcc:
gcc doesn't align 32-bit instructions on 32-bit boundaries, even when it's very easy to do.
gcc uses vfma, unless prevented to do so by -std=c99 or by -ffp-contract=on
clang:
clang doesn't schedule dependent vmul.F32 and vadd.F32 one instruction apart. Even when it's very easy to do.