Understanding Cortex M4F - VLDM

By: Michael S (already5chosen.delete@this.yahoo.com), June 4, 2020 2:28 pm
Room: Moderated Discussions
Wilco (wilco.dijkstra.delete@this.ntlworld.com) on June 2, 2020 3:02 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on June 2, 2020 11:56 am wrote:
> > I was in harry while writing a previous post.
> > The example above is not a good one, because in case above VMLA.F32 is a *good* choice.
> >
> > As I said in original post, according to TRM VMLA.F32 slower than properly scheduled separate add+mul.
>
> Strictly speaking it is not slower - the latency and throughput are identical.
> However you get one extra execute slot per fma if you split into mul+add.
>
> > But example above is too short and does not provide an opportunity for proper scheduling.
> >
> > This example is better:

> > void foo(float* restrict res, const float x[4], float y, float z)
> > {
> > res[0] = x[0]*y + z;
> > res[1] = x[1]*y + z;
> > res[2] = x[2]*y + z;
> > res[3] = x[3]*y + z;
> > }
>
> > gcc on godbolt: https://godbolt.org/z/e6EHce
>
> How about this? This should probably be the default for Cortex-M4, just like LLVM. Note the
> generated code is the same for all CPUs, even say -mcortex-a53. This is a good example why AArch64
> added 4-operand FMA - expanding mov+fma back into mul+add would be better in this case.
>
> The goal of Cortex-M4 is to beat software floating point emulation - it achieves that.
>
> Wilco

Do you have a guess about why compilers (not only gcc, clang too) don't generate VLDM.F32 ?
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Understanding Cortex M4F instructions timingMichael S2020/06/01 11:07 AM
  Understanding Cortex M4F instructions timinganon³2020/06/01 10:26 PM
  Understanding Cortex M4F instructions timingMichael S2020/06/02 08:23 AM
  Understanding Cortex M4F instructions timingDan Fay2020/06/02 08:37 AM
    Understanding Cortex M4F instructions timingDan Fay2020/06/02 09:19 AM
      Understanding Cortex M4F instructions timingMichael S2020/06/02 09:48 AM
        Understanding Cortex M4F instructions timingMichael S2020/06/02 11:56 AM
          Understanding Cortex M4F instructions timingMichael S2020/06/02 12:07 PM
            Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:22 PM
          Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:08 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 01:20 PM
          Understanding Cortex M4F instructions timingWilco2020/06/02 03:02 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 03:17 PM
            Understanding Cortex M4F - VLDMMichael S2020/06/04 02:28 PM
            The goal of Cortex-M4 FPUMichael S2020/06/04 02:30 PM
              The goal of Cortex-M4 FPUDan Fay2020/06/05 08:31 AM
      ARMC6 - Arm or clang ?Michael S2020/06/05 05:49 AM
        ARMC6 - Arm or clang ?Dan Fay2020/06/05 08:26 AM
          ARMC6 - Arm or clang ?Michael S2020/06/05 08:55 AM
            M4F - few convolution benchesMichael S2020/06/11 09:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?