FP Adders are not so cheap compared to FP multipliers

By: Heikki Kultala (heikki.kultala.delete@this.gmail.com), November 9, 2022 9:07 am
Room: Moderated Discussions
Björn Ragnar Björnsson (bjorn.ragnar.delete@this.gmail.com) on November 8, 2022 8:24 pm wrote:

> Indeed they do not scale, so I would like to remind the folks in this discussion of the
> fact AMD did something special for full width (512 bits) shuffle. Alexander Yee tested Zen4
> AVX-512 for AMD shortlt before Zen4 release and came to the conclusion that Zen4 can do
> full width shuffles at 1/cycle. His guess is that Zen4 has two shuffle units, one 256 bits
> and one 512 bit, the bigger one being able function as 2 256 bit shuffle units.
>
> Additionally, Alexander has a small "Editorial comment" where he preemptively reinforces Linus' points:
>
> "In my opinion, Intel's mistake with AVX512 is to optimize for the 100% FMA workloads (namely
> Linpack) instead of the more common mixed FADD/FMA workloads. Adders are cheap. Multipliers
> are expensive. One of each would do just fine for most workloads. Instead, Intel decided to
> add a 2nd FMA to Skylake X/SP... It is that 2nd FMA which caused most of the power/throttling
> issues that has tainted AVX512's reputation and hindered its adoption."

FP adders are not so much cheaper than FP multipliers, in some cases they can even be more expensive than standalone FP multipliers.

The cost is not in the calculation itself, but in the alignment of operands and normalization in the end.

FP multiplication does not need alignment for inputs, but FP addition requires alignment for inputs.

This cost of alignment of operands and normalization is problem especially in CPUs where fast latency for operations is desired. The optimizations to make these have faster latency are very expensive on area and power.

However, FMA requires wider adder than what is required for standalone adder, which also makes the normalization wider, so FMA is still always much more expensive than adder.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Jeffrey Bosboom2022/11/04 06:18 PM
  Clarification?Mark Roulo2022/11/04 08:34 PM
    Expanded question about design pointsJeffrey Bosboom2022/11/04 10:37 PM
      Expanded question about design pointsAnon2022/11/04 10:53 PM
        Expanded question about design pointsJeffrey Bosboom2022/11/04 11:05 PM
          Expanded question about design pointsAnon2022/11/04 11:30 PM
            Expanded question about design pointsChester2022/11/05 04:24 PM
              Expanded question about design pointsAnon2022/11/05 04:43 PM
              Expanded question about design pointsLinus Torvalds2022/11/06 02:18 PM
                Expanded question about design pointsAdrian2022/11/07 04:38 AM
                  Expanded question about design pointsanon2022/11/07 12:34 PM
                    Expanded question about design pointsAdrian2022/11/08 04:34 AM
                      Expanded question about design pointsChester2022/11/08 08:29 AM
                      Expanded question about design pointsanon2022/11/08 09:01 AM
                        Expanded question about design pointsAdrian2022/11/08 09:53 AM
                          Expanded question about design pointsLinus Torvalds2022/11/08 11:35 AM
                            Expanded question about design pointsBrett2022/11/08 12:33 PM
                              Expanded question about design pointsBrett2022/11/08 12:48 PM
                              Expanded question about design points---2022/11/09 11:41 AM
                            Expanded question about design pointsAdrian2022/11/08 12:45 PM
                              Expanded question about design pointsLinus Torvalds2022/11/08 01:29 PM
                                Expanded question about design pointsanon2022/11/08 01:58 PM
                              Zen 4cJames2022/11/09 03:54 AM
                                Zen 4cAndrew Clough2022/11/09 05:59 AM
                                  Zen 4canonymou52022/11/09 12:29 PM
                                    Zen 4cChester2022/11/09 09:12 PM
                            Expanded question about design pointsBjörn Ragnar Björnsson2022/11/08 09:24 PM
                              FP Adders are not so cheap compared to FP multipliersHeikki Kultala2022/11/09 09:07 AM
                                FP Adders are not so cheap compared to FP multipliersBjörn Ragnar Björnsson2022/11/10 12:10 AM
                          Expanded question about design pointsAnon2022/11/08 06:31 PM
      Expanded question about design pointsAdrian2022/11/05 03:00 AM
        Expanded question about design pointsAnon2022/11/05 03:27 AM
          Expanded question about design pointsAdrian2022/11/05 03:50 AM
            Expanded question about design pointsAnon2022/11/05 04:10 AM
              Expanded question about design pointsAdrian2022/11/05 07:34 AM
        Expanded question about design pointshobold2022/11/06 04:48 AM
          Expanded question about design pointsAdrian2022/11/07 04:19 AM
            Expanded question about design pointsAdrian2022/11/07 09:07 AM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Anon2022/11/04 08:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512noko2022/11/04 09:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Brendan2022/11/05 02:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊