Understanding Cortex M4F instructions timing

By: Dan Fay (daniel.fay.delete@this.gmail.com), June 2, 2020 1:08 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on June 2, 2020 11:56 am wrote:
> I was in harry while writing a previous post.
> The example above is not a good one, because in case above VMLA.F32 is a *good* choice.
>
> As I said in original post, according to TRM VMLA.F32 slower than properly scheduled separate add+mul.
> But example above is too short and does not provide an opportunity for proper scheduling.
>
> This example is better:

> void foo(float* restrict res, const float x[4], float y, float z)
> {
> res[0] = x[0]*y + z;
> res[1] = x[1]*y + z;
> res[2] = x[2]*y + z;
> res[3] = x[3]*y + z;
> }
>
> gcc on godbolt: https://godbolt.org/z/e6EHce

I had to take out the restrict keyword. Here's what I got for the M4F (the M7 was the same):

08010b7e :
8010b7e: ed91 1a00 vldr s2, [r1]
8010b82: eeb0 2a60 vmov.f32 s4, s1
8010b86: ee01 2a00 vmla.f32 s4, s2, s0
8010b8a: ed80 2a00 vstr s4, [r0]
8010b8e: eeb0 2a60 vmov.f32 s4, s1
8010b92: ed91 1a01 vldr s2, [r1, #4]
8010b96: ee01 2a00 vmla.f32 s4, s2, s0
8010b9a: ed80 2a01 vstr s4, [r0, #4]
8010b9e: eeb0 2a60 vmov.f32 s4, s1
8010ba2: ed91 1a02 vldr s2, [r1, #8]
8010ba6: ee01 2a00 vmla.f32 s4, s2, s0
8010baa: ed80 2a02 vstr s4, [r0, #8]
8010bae: ed91 1a03 vldr s2, [r1, #12]
8010bb2: ee41 0a00 vmla.f32 s1, s2, s0
8010bb6: edc0 0a03 vstr s1, [r0, #12]
8010bba: 4770 bx lr

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Understanding Cortex M4F instructions timingMichael S2020/06/01 11:07 AM
  Understanding Cortex M4F instructions timinganon³2020/06/01 10:26 PM
  Understanding Cortex M4F instructions timingMichael S2020/06/02 08:23 AM
  Understanding Cortex M4F instructions timingDan Fay2020/06/02 08:37 AM
    Understanding Cortex M4F instructions timingDan Fay2020/06/02 09:19 AM
      Understanding Cortex M4F instructions timingMichael S2020/06/02 09:48 AM
        Understanding Cortex M4F instructions timingMichael S2020/06/02 11:56 AM
          Understanding Cortex M4F instructions timingMichael S2020/06/02 12:07 PM
            Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:22 PM
          Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:08 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 01:20 PM
          Understanding Cortex M4F instructions timingWilco2020/06/02 03:02 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 03:17 PM
            Understanding Cortex M4F - VLDMMichael S2020/06/04 02:28 PM
            The goal of Cortex-M4 FPUMichael S2020/06/04 02:30 PM
              The goal of Cortex-M4 FPUDan Fay2020/06/05 08:31 AM
      ARMC6 - Arm or clang ?Michael S2020/06/05 05:49 AM
        ARMC6 - Arm or clang ?Dan Fay2020/06/05 08:26 AM
          ARMC6 - Arm or clang ?Michael S2020/06/05 08:55 AM
            M4F - few convolution benchesMichael S2020/06/11 09:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?