Understanding Cortex M4F instructions timing

By: Michael S (already5chosen.delete@this.yahoo.com), June 2, 2020 1:20 pm
Room: Moderated Discussions
Dan Fay (daniel.fay.delete@this.gmail.com) on June 2, 2020 1:08 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on June 2, 2020 11:56 am wrote:
> > I was in harry while writing a previous post.
> > The example above is not a good one, because in case above VMLA.F32 is a *good* choice.
> >
> > As I said in original post, according to TRM VMLA.F32 slower than properly scheduled separate add+mul.
> > But example above is too short and does not provide an opportunity for proper scheduling.
> >
> > This example is better:

> > void foo(float* restrict res, const float x[4], float y, float z)
> > {
> > res[0] = x[0]*y + z;
> > res[1] = x[1]*y + z;
> > res[2] = x[2]*y + z;
> > res[3] = x[3]*y + z;
> > }
>
> > gcc on godbolt: https://godbolt.org/z/e6EHce
>
> I had to take out the restrict keyword. Here's what I got for the M4F (the M7 was the same):
>

Without restrict it's not quite the same. BTW, why restrict does not work? May be, you are compiling for C++ instead of C? In C++ a similar keyword is __restrict. But it is better to use C.
C++ is sometimes surprising, even with supposedly simple code.

If you can't make 'restrict' work then code could be manually re-written to produce the same effect:

void foo1(float* res, const float x[4], float y, float z)
{
float x0 = x[0];
float x1 = x[1];
float x2 = x[2];
float x3 = x[3];
res[0] = x0*y + z;
res[1] = x1*y + z;
res[2] = x2*y + z;
res[3] = x3*y + z;
}


< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Understanding Cortex M4F instructions timingMichael S2020/06/01 11:07 AM
  Understanding Cortex M4F instructions timinganon³2020/06/01 10:26 PM
  Understanding Cortex M4F instructions timingMichael S2020/06/02 08:23 AM
  Understanding Cortex M4F instructions timingDan Fay2020/06/02 08:37 AM
    Understanding Cortex M4F instructions timingDan Fay2020/06/02 09:19 AM
      Understanding Cortex M4F instructions timingMichael S2020/06/02 09:48 AM
        Understanding Cortex M4F instructions timingMichael S2020/06/02 11:56 AM
          Understanding Cortex M4F instructions timingMichael S2020/06/02 12:07 PM
            Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:22 PM
          Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:08 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 01:20 PM
          Understanding Cortex M4F instructions timingWilco2020/06/02 03:02 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 03:17 PM
            Understanding Cortex M4F - VLDMMichael S2020/06/04 02:28 PM
            The goal of Cortex-M4 FPUMichael S2020/06/04 02:30 PM
              The goal of Cortex-M4 FPUDan Fay2020/06/05 08:31 AM
      ARMC6 - Arm or clang ?Michael S2020/06/05 05:49 AM
        ARMC6 - Arm or clang ?Dan Fay2020/06/05 08:26 AM
          ARMC6 - Arm or clang ?Michael S2020/06/05 08:55 AM
            M4F - few convolution benchesMichael S2020/06/11 09:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?