Understanding Cortex M4F instructions timing

By: Dan Fay (daniel.fay.delete@this.gmail.com), June 2, 2020 8:37 am
Room: Moderated Discussions

> What I don't understand is how FP part of the core work.
> 1. I have no mental figure of relationship between integer pipeline and FP pipeline.
> 2. I don't understand why FP Load instructions can't be pipelined with other FP
> Load/Store instructions in the manner similar to their Integer counterparts.
> 3. TRM claims that computational FP instructions, like VADD.F32 or VMUL.F32
> have latency of 2 clocks and throughput of 1 per clock, but my measurements
> (on STM32F303) clearly show that the throughput is twice lower than claimed.
> 4. According to TRM, multiply-accumulate instructions, both fused (VFMA.F32) and non-fused
> (VMLA.F32) are slower than properly scheduled separate add+mul. If it's true then why
> compilers, in particular gcc, generate them when optimizing for speed (-O2)?
> 5. If VLDM.32 is really so much faster than the sequence of VLDR.32 as claimed
> in TRM then why gcc -O2 does not generate VLDM.32 at every opportunity?

I'm guessing that the FP ADD+MUL take up more code space than a single FMA instruction? I wouldn't be surprised that, even for -O2, they decide smaller code is worth the trouble with a microcontroller. What happens with -O3?

I'm also curious what ARM's compiler does. I'm going to try to look at what it generates for an M4F (specifically, an STM32F412).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Understanding Cortex M4F instructions timingMichael S2020/06/01 11:07 AM
  Understanding Cortex M4F instructions timinganon³2020/06/01 10:26 PM
  Understanding Cortex M4F instructions timingMichael S2020/06/02 08:23 AM
  Understanding Cortex M4F instructions timingDan Fay2020/06/02 08:37 AM
    Understanding Cortex M4F instructions timingDan Fay2020/06/02 09:19 AM
      Understanding Cortex M4F instructions timingMichael S2020/06/02 09:48 AM
        Understanding Cortex M4F instructions timingMichael S2020/06/02 11:56 AM
          Understanding Cortex M4F instructions timingMichael S2020/06/02 12:07 PM
            Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:22 PM
          Understanding Cortex M4F instructions timingDan Fay2020/06/02 01:08 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 01:20 PM
          Understanding Cortex M4F instructions timingWilco2020/06/02 03:02 PM
            Understanding Cortex M4F instructions timingMichael S2020/06/02 03:17 PM
            Understanding Cortex M4F - VLDMMichael S2020/06/04 02:28 PM
            The goal of Cortex-M4 FPUMichael S2020/06/04 02:30 PM
              The goal of Cortex-M4 FPUDan Fay2020/06/05 08:31 AM
      ARMC6 - Arm or clang ?Michael S2020/06/05 05:49 AM
        ARMC6 - Arm or clang ?Dan Fay2020/06/05 08:26 AM
          ARMC6 - Arm or clang ?Michael S2020/06/05 08:55 AM
            M4F - few convolution benchesMichael S2020/06/11 09:35 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?