Fusing into FMA

Article: AMD's Bulldozer Microarchitecture
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 29, 2010 7:25 am
Room: Moderated Discussions
Ian Ollmann (iano@apple.com) on 9/28/10 wrote:
---------------------------
>Paul A. Clayton (paaronclayton@gmail.com) on 9/27/10 wrote:
>---------------------------
[snip]
>>The hardware could provide intermediate rounding without
>>significantly increasing latency. (The MIPS definition of
>>MADD.{D,S,PS} uses intermediate rounding.)
>
>It could, but then you lose the biggest advantage of the FMA. The ability to defeat
>the rounding of the multiplication lets you do a lot of very powerful new things
>in algorithms, without incurring substantial changes / area cost to the rest of
>the surrounding EUs, register file, etc. The concatenation is only a minor benefit.
>The precision wins (when used skillfully) are what are great about FMA.

First, supporting intermediate rounding does not
prohibit the support of an extended precision intermediate
result. Whether the complexity is justified is another
matter.

Second, I suspect workloads that significantly benefit
from multiply-and-add are not insignificant in
importance.

>I frankly don't see the difference between the fmac operation you describe and
>a separate multiplier and adder with a fast forwarding pathway in between.

Some benefits of FMADD:

  • fewer operations to decode, rename, schedule, forward
    results from, etc.

  • addition can use another layer of carry-save
    addition rather than a full propagation



The main issue with FMADD seems to be the use of three
source operands which complicates renaming, scheduling,
and forwarding.

(I am a little surprised that separate FADD and FMADD
Execution Units do not seem to be implemented. In
my ignorance this would not seem much more expensive than
a FADD and a FMUL pair of EUs. A wild guess would be that
2 FMADD and 1 FADD EU would be a sweet spot. [Perhaps an
FADD EU could perform 3-way sums so certain requirements
would be similar for the two types of EUs??])


Paul A. Clayton
just a technophile
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Bulldozer article onlineDavid Kanter2010/08/30 11:44 PM
  Bulldozer article onlineTriskaine2010/08/31 01:14 AM
  Bulldozer article online?2010/08/31 02:39 AM
    Bulldozer article onlinehobold2010/08/31 11:00 AM
    Dispatch groupsDavid Kanter2010/08/31 12:52 PM
      Dispatch groupsIntelUser20002010/08/31 02:40 PM
        Dispatch groupsDavid Kanter2010/08/31 03:22 PM
      Dispatch groupsarb2010/08/31 03:11 PM
        Dispatch groupsredpriest2010/08/31 08:46 PM
      Dispatch groups?2010/09/01 12:41 AM
        Dispatch groupsDavid Kanter2010/09/01 09:15 AM
          Dispatch groups?2010/09/01 10:03 AM
  Bulldozer article onlineAlex2010/08/31 03:45 AM
    Bulldozer article onlineIntelUser20002010/08/31 06:46 AM
      merci (NT)Alex2010/08/31 10:02 AM
    Bulldozer article onlinehobold2010/08/31 10:56 AM
      Bulldozer article onlineDavid Kanter2010/08/31 12:53 PM
  Bulldozer article onlinesomeone2010/08/31 06:09 AM
    Bulldozer article onlineIntelUser20002010/08/31 06:41 AM
      Thanks (NT)someone2010/08/31 06:52 AM
  Bulldozer article onlineRohit2010/08/31 06:13 AM
    Extremely unlikelyDaniel Bizó2010/08/31 03:00 PM
      Bulldozer versus Westmere single threaded performanceMark Roulo2010/08/31 03:35 PM
        Bulldozer versus Westmere single threaded performanceHoward Chu2010/08/31 07:25 PM
        Bulldozer versus Westmere single threaded performanceAndreas2010/09/01 01:01 AM
          Bulldozer versus Westmere single threaded performanceFoo_2010/09/01 02:11 AM
            Bulldozer versus Westmere single threaded performanceJack2010/09/01 10:08 PM
              Bulldozer versus Westmere single threaded performanceslacker2010/09/02 05:59 AM
                Bulldozer versus Westmere single threaded performanceJack2010/09/02 05:14 PM
        Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 04:41 AM
          Bulldozer versus Westmere single threaded performance?2010/09/01 06:10 AM
            Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 08:11 AM
              Bulldozer versus Westmere single threaded performanceDavid Kanter2010/09/01 08:50 AM
          Bulldozer versus Westmere single threaded performanceMark Roulo2010/09/01 09:14 AM
            Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 09:33 AM
      There is no 20% penalty of two threads compared to single core - it's 10%Heikki Kultala2010/09/02 10:39 PM
        There is no 20% penalty of two threads compared to single core - it's 10%Azazel2010/09/03 12:35 AM
          There is no 20% penalty of two threads compared to single core - it's 10%?2010/09/03 02:40 AM
          There is no 20% penalty of two threads compared to single core - it's 10%Heikki Kultala2010/09/03 03:49 AM
            There is no 20% penalty of two threads compared to single core - it's 10%Azazel2010/09/03 07:31 AM
          There is no 20% penalty of two threads compared to single core - it's 10%MS2010/09/03 06:27 AM
          Don't read too much into it...David Kanter2010/09/03 03:14 PM
            Don't read too much into it...DC2010/09/05 11:07 AM
  Bulldozer article onlineIan Ollmann2010/08/31 05:07 PM
    Bulldozer article onlineMatt Waldhauer2010/09/05 04:10 AM
      Bulldozer article onlinehobold2010/09/05 08:19 AM
        Bulldozer article onlineDC2010/09/05 11:12 AM
          Bulldozer article onlineMichael S2010/09/05 12:27 PM
            Bulldozer article onlineBrett2010/09/05 03:01 PM
          Bulldozer article onlinesomeone2010/09/05 01:56 PM
            Bulldozer article onlineAlex2010/09/05 02:59 PM
              Bulldozer article onlinesomeone2010/09/05 03:05 PM
                Bulldozer article onlineDC2010/09/06 07:38 AM
                  Bulldozer article onlinesomeone2010/09/06 08:24 AM
                  Bulldozer article onlineDavid Kanter2010/09/06 05:12 PM
                    Bulldozer article onlineEduardoS2010/09/06 06:21 PM
                    Bulldozer article onlineanon2010/09/06 06:26 PM
                      Bulldozer article onlineDavid Kanter2010/09/06 07:55 PM
                        Bulldozer article onlineanon2010/09/06 10:49 PM
                          Bulldozer article onlineDavid Kanter2010/09/06 10:57 PM
                            Bulldozer article onlineJack2010/09/07 09:55 PM
                            Bulldozer article onlineslacker2010/09/08 05:53 AM
            Bulldozer article onlineDC2010/09/06 07:43 AM
              Bulldozer article onlineMegol2010/09/06 08:27 AM
              Bulldozer article onlinesomeone2010/09/06 08:53 AM
                Why SOIDavid Kanter2010/09/06 05:19 PM
              Bulldozer article onlineJack2010/09/07 10:16 PM
                Bulldozer article onlineslacker2010/09/08 08:05 AM
                  Bulldozer article onlinesomeone2010/09/08 09:35 AM
                  Bulldozer article onlineJack2010/09/12 08:26 PM
                Bulldozer article onlinesomeone2010/09/08 09:03 AM
                  Some strawmen will never die (NT)slacker2010/09/08 09:13 AM
                    Yeah, like PD-SOI is worth the effort (NT)someone2010/09/08 09:39 AM
                      PDSOI is worth production dollars; FDSOI & silicon lasing are Intel's wet dreamslacker2010/09/08 10:54 AM
                  Bulldozer article onlineFritz2010/09/22 03:41 AM
                    Bulldozer article onlineanonymous2010/09/22 11:12 AM
                      Bulldozer article onlinesavantu2010/09/22 09:13 PM
      Bulldozer article onlineIan Ollmann2010/09/24 05:50 PM
        Bulldozer article onlineIan Ollmann2010/09/24 06:21 PM
          Bulldozer article onlineEduardoS2010/09/24 08:47 PM
            Bulldozer article onlineMichael S2010/09/25 10:41 AM
            Faster FADDPaul A. Clayton2010/09/25 11:58 AM
              Faster FADDEduardoS2010/09/25 12:07 PM
                Faster FADDPaul A. Clayton2010/09/28 11:29 AM
          Bulldozer article onlineMichael S2010/09/25 10:38 AM
        Bulldozer article onlineDavid Kanter2010/09/25 01:55 AM
        Bulldozer article onlineHans de Vries2010/09/26 06:58 AM
          Bulldozer article onlineDavid Kanter2010/09/26 10:38 AM
            Bulldozer article onlineHans de Vries2010/09/26 02:48 PM
              Bulldozer article onlineEduardoS2010/09/26 03:47 PM
                FP ADDs are not that fastMatt Waldhauer2010/09/30 07:46 AM
                  FP ADDs are not that fastsJ2010/09/30 11:24 AM
                    FP ADDs are not that fastEduardoS2010/09/30 01:41 PM
                      FP ADDs are not that fastrwessel2010/09/30 02:41 PM
                    FP ADDs are not that fastHans de Vries2010/09/30 07:11 PM
                  FP ADDs are not that fastEduardoS2010/09/30 01:43 PM
                    FP ADDs are not that fastMichael S2010/09/30 02:25 PM
                      FP ADDs are not that fastEduardoS2010/09/30 02:57 PM
                      FP ADDs are not that fastEric Quinnell2010/10/01 01:29 PM
                        FP ADDs are not that fastEduardoS2010/10/01 01:40 PM
                          FP ADDs are not that fastDavid Kanter2010/10/01 02:29 PM
                        Delayed post-result shift?Paul A. Clayton2010/10/01 06:14 PM
              Bulldozer article onlineDavid Kanter2010/09/26 03:50 PM
                Fusing into FMAPaul A. Clayton2010/09/27 02:01 PM
                  Fusing into FMAIan Ollmann2010/09/28 04:58 PM
                    Fusing into FMAPaul A. Clayton2010/09/29 07:25 AM
              Bulldozer article onlineMichael S2010/09/26 04:23 PM
                Bulldozer article onlineanonymous2010/09/27 01:38 AM
                  Bulldozer article onlineMichael S2010/09/27 05:00 AM
          Bulldozer article onlineEduardoS2010/09/26 10:51 AM
            Bulldozer article onlineHans de Vries2010/09/26 12:32 PM
          Bulldozer article onlineHans de Vries2010/09/26 12:28 PM
  Bulldozer article onlineMS2010/09/01 06:15 AM
  10h family 4MiB page TLB entriesPaul A. Clayton2010/09/01 06:01 PM
    10h family 4MiB page TLB entriesEduardoS2010/09/01 06:40 PM
      10h family 4MiB page TLB entriesrwessel2010/09/02 04:09 AM
        10h family 4MiB page TLB entriesEduardoS2010/09/02 03:04 PM
  Bulldozer article onlineDan Downs2010/09/02 08:28 AM
    Bulldozer article onlineslacker2010/09/02 08:37 AM
      Bulldozer article onlineDan Downs2010/09/02 10:06 AM
        Bulldozer article onlineDavid Kanter2010/09/02 04:41 PM
          Bulldozer article onlineDan Downs2010/09/02 10:42 PM
    one thread in two cores - NOHeikki Kultala2010/09/02 10:25 PM
      RF portsDavid Kanter2010/09/04 01:32 PM
  Slightly OT, but does iAtom use physical register files (PRF), too ? (NT)Alex2010/09/02 03:49 PM
    AFAIK Atom does not rename registersHeikki Kultala2010/09/03 03:46 AM
      AFAIK Atom does not rename registersAlex2010/09/03 08:41 AM
  FMACs can be ganged together, confirmed already by JF (link inside)Alex2010/09/08 04:44 PM
    FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 07:32 AM
      FMACs can be ganged together, confirmed already by JF (link inside)Triskaine2010/09/09 07:48 AM
        FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 08:11 AM
          FMACs can be ganged together, confirmed already by JF (link inside)arb2010/09/09 09:22 AM
            Bulldozer and AVXDavid Kanter2010/09/09 10:21 AM
          FMACs can be ganged together, confirmed already by JF (link inside)gruehunter2010/09/09 01:08 PM
            FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 04:39 PM
              FMACs can be ganged together, confirmed already by JF (link inside)redpriest2010/09/09 09:56 PM
                FMACs can be ganged together, confirmed already by JF (link inside)Anthony2010/09/10 11:59 AM
                  FMACs can be ganged together, confirmed already by JF (link inside)someone2010/09/10 12:11 PM
      FMACs can be ganged together, confirmed already by JF (link inside)Alex2010/09/10 12:04 PM
    FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFFritz2010/09/22 04:37 AM
      FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFAnthony2010/09/22 11:01 AM
        FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFDavid Kanter2010/09/22 11:22 AM
          FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFanon2010/09/22 12:19 PM
            Bulldozer FPDavid Kanter2010/09/22 03:32 PM
              Bulldozer FPEduardoS2010/09/22 04:17 PM
                Bulldozer FPAaron Spink2010/09/23 12:13 PM
                  Bulldozer FPEduardoS2010/09/23 02:04 PM
                Bulldozer FPMichael S2010/09/23 01:26 PM
              Bulldozer FPanonymous2010/09/22 04:35 PM
              Bulldozer FPhobold2010/09/23 08:21 AM
                Bulldozer FPanon2010/09/23 11:59 AM
                  Bulldozer FPEduardoS2010/09/23 02:12 PM
                  Bulldozer FPhobold2010/09/23 02:19 PM
              Bulldozer FPHans de Vries2010/09/23 10:07 AM
                Bulldozer FPEric Bron2010/09/23 11:05 AM
          FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFredpriest2010/09/23 05:47 PM
      FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFAnthony2010/09/22 11:01 AM
  Bulldozer article onlineMarcal2010/09/29 03:13 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?