RF ports

Article: AMD's Bulldozer Microarchitecture
By: David Kanter (dkanter.delete@this.realworldtech.com), September 4, 2010 1:32 pm
Room: Moderated Discussions
Heikki Kultala (hkultala@iki.NOSPAM.fi) on 9/2/10 wrote:
---------------------------
>Dan Downs (retnuh@nospam.retnuh.org) on 9/2/10 wrote:
>---------------------------
>>Is there any chance of a single thread being dispatched to both integer cores? Something
>>like Core 1 being able to spill over to Core 2. And if not in this round, would it make sense to do so in the future?
>>
>>I may have completely skimmed over it, and I'm trying to recall some of the speculation
>>of bulldozer, but I don't remember reading about this one way or another in the new info released.
>
>he whole point of bulldozer's "cluster-based multithreading" is to make the 2 threads
>have their own L1D cache, AND still have fast(low-delay) >integer datapaths between the cache and the integer ALU's.

Exactly.

>1) The L1 data cache and the integer ALU's sit close to each others in the chip,
>so that the wire delays are small. The ALU's of the another integer core are much
>further. It would require extra delays to route the data >there

Yes.

>Look at bulldozer FPU latencies for operations that use memory They are much longer
>than "K10" Fpu latencies. The reason is because the loading of the data is done
>in the integer unit, and then the loaded data is >tranferred to the fpu cluster, which is slow.

The best case latency is very similar to the family 10h, since the cache latency is about the same 3 vs. 4 cycles. The major difference is when you get queuing of memory accesses or FP resource contention. Or an L1 miss, which will cost you 20 cycles.

[snip]

>3) The RF port things.
>Multi-port RF's are big and power-hungry. They also need quite many read and write
>ports for those 4-way integer cores, but having RF's that can feed 8 execution units..
>those would be very massive, power-hungry and have long delays.
>There are however some ways of splitting the RF into multiple parts (21264 has
>2 RF's , one in both clusters, and one cycle extra delay when data is moved between
>clusters. This kind of solution would be the most reasonable, but it also adds the
>write ports for data that comes from the another cluster >so the register files would still get more complex.

Actually, this is one area where the I think AMD improved. The reality is that you do not need all those ports on the RF. Most operations are fed directly from the bypass network. e.g. you have a load-op instruction (e.g. a 64-bit integer load that feeds into an add). If you executed them separately, you would need:

1st port for the load address read
2nd port for the writing the load data into RF
3rd port to read the register input of the add
4th port to read the memory input of the add (that has been put into RF)
5th port to write the add result

With load-op, you can eliminate the 4th port, so you save a bit of resources. If your scheduling is really good, you can eliminate the 3rd port as well.

IIRC, the P6 derivatives never had enough ports on the register files to satisfy all the possible operations at once, and instead rely on the forwarding network extensively. The only time your operands *must* come from the RF is after a pipeline squash, context switch, etc.

David
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Bulldozer article onlineDavid Kanter2010/08/30 11:44 PM
  Bulldozer article onlineTriskaine2010/08/31 01:14 AM
  Bulldozer article online?2010/08/31 02:39 AM
    Bulldozer article onlinehobold2010/08/31 11:00 AM
    Dispatch groupsDavid Kanter2010/08/31 12:52 PM
      Dispatch groupsIntelUser20002010/08/31 02:40 PM
        Dispatch groupsDavid Kanter2010/08/31 03:22 PM
      Dispatch groupsarb2010/08/31 03:11 PM
        Dispatch groupsredpriest2010/08/31 08:46 PM
      Dispatch groups?2010/09/01 12:41 AM
        Dispatch groupsDavid Kanter2010/09/01 09:15 AM
          Dispatch groups?2010/09/01 10:03 AM
  Bulldozer article onlineAlex2010/08/31 03:45 AM
    Bulldozer article onlineIntelUser20002010/08/31 06:46 AM
      merci (NT)Alex2010/08/31 10:02 AM
    Bulldozer article onlinehobold2010/08/31 10:56 AM
      Bulldozer article onlineDavid Kanter2010/08/31 12:53 PM
  Bulldozer article onlinesomeone2010/08/31 06:09 AM
    Bulldozer article onlineIntelUser20002010/08/31 06:41 AM
      Thanks (NT)someone2010/08/31 06:52 AM
  Bulldozer article onlineRohit2010/08/31 06:13 AM
    Extremely unlikelyDaniel Bizó2010/08/31 03:00 PM
      Bulldozer versus Westmere single threaded performanceMark Roulo2010/08/31 03:35 PM
        Bulldozer versus Westmere single threaded performanceHoward Chu2010/08/31 07:25 PM
        Bulldozer versus Westmere single threaded performanceAndreas2010/09/01 01:01 AM
          Bulldozer versus Westmere single threaded performanceFoo_2010/09/01 02:11 AM
            Bulldozer versus Westmere single threaded performanceJack2010/09/01 10:08 PM
              Bulldozer versus Westmere single threaded performanceslacker2010/09/02 05:59 AM
                Bulldozer versus Westmere single threaded performanceJack2010/09/02 05:14 PM
        Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 04:41 AM
          Bulldozer versus Westmere single threaded performance?2010/09/01 06:10 AM
            Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 08:11 AM
              Bulldozer versus Westmere single threaded performanceDavid Kanter2010/09/01 08:50 AM
          Bulldozer versus Westmere single threaded performanceMark Roulo2010/09/01 09:14 AM
            Bulldozer versus Westmere single threaded performanceRichard Cownie2010/09/01 09:33 AM
      There is no 20% penalty of two threads compared to single core - it's 10%Heikki Kultala2010/09/02 10:39 PM
        There is no 20% penalty of two threads compared to single core - it's 10%Azazel2010/09/03 12:35 AM
          There is no 20% penalty of two threads compared to single core - it's 10%?2010/09/03 02:40 AM
          There is no 20% penalty of two threads compared to single core - it's 10%Heikki Kultala2010/09/03 03:49 AM
            There is no 20% penalty of two threads compared to single core - it's 10%Azazel2010/09/03 07:31 AM
          There is no 20% penalty of two threads compared to single core - it's 10%MS2010/09/03 06:27 AM
          Don't read too much into it...David Kanter2010/09/03 03:14 PM
            Don't read too much into it...DC2010/09/05 11:07 AM
  Bulldozer article onlineIan Ollmann2010/08/31 05:07 PM
    Bulldozer article onlineMatt Waldhauer2010/09/05 04:10 AM
      Bulldozer article onlinehobold2010/09/05 08:19 AM
        Bulldozer article onlineDC2010/09/05 11:12 AM
          Bulldozer article onlineMichael S2010/09/05 12:27 PM
            Bulldozer article onlineBrett2010/09/05 03:01 PM
          Bulldozer article onlinesomeone2010/09/05 01:56 PM
            Bulldozer article onlineAlex2010/09/05 02:59 PM
              Bulldozer article onlinesomeone2010/09/05 03:05 PM
                Bulldozer article onlineDC2010/09/06 07:38 AM
                  Bulldozer article onlinesomeone2010/09/06 08:24 AM
                  Bulldozer article onlineDavid Kanter2010/09/06 05:12 PM
                    Bulldozer article onlineEduardoS2010/09/06 06:21 PM
                    Bulldozer article onlineanon2010/09/06 06:26 PM
                      Bulldozer article onlineDavid Kanter2010/09/06 07:55 PM
                        Bulldozer article onlineanon2010/09/06 10:49 PM
                          Bulldozer article onlineDavid Kanter2010/09/06 10:57 PM
                            Bulldozer article onlineJack2010/09/07 09:55 PM
                            Bulldozer article onlineslacker2010/09/08 05:53 AM
            Bulldozer article onlineDC2010/09/06 07:43 AM
              Bulldozer article onlineMegol2010/09/06 08:27 AM
              Bulldozer article onlinesomeone2010/09/06 08:53 AM
                Why SOIDavid Kanter2010/09/06 05:19 PM
              Bulldozer article onlineJack2010/09/07 10:16 PM
                Bulldozer article onlineslacker2010/09/08 08:05 AM
                  Bulldozer article onlinesomeone2010/09/08 09:35 AM
                  Bulldozer article onlineJack2010/09/12 08:26 PM
                Bulldozer article onlinesomeone2010/09/08 09:03 AM
                  Some strawmen will never die (NT)slacker2010/09/08 09:13 AM
                    Yeah, like PD-SOI is worth the effort (NT)someone2010/09/08 09:39 AM
                      PDSOI is worth production dollars; FDSOI & silicon lasing are Intel's wet dreamslacker2010/09/08 10:54 AM
                  Bulldozer article onlineFritz2010/09/22 03:41 AM
                    Bulldozer article onlineanonymous2010/09/22 11:12 AM
                      Bulldozer article onlinesavantu2010/09/22 09:13 PM
      Bulldozer article onlineIan Ollmann2010/09/24 05:50 PM
        Bulldozer article onlineIan Ollmann2010/09/24 06:21 PM
          Bulldozer article onlineEduardoS2010/09/24 08:47 PM
            Bulldozer article onlineMichael S2010/09/25 10:41 AM
            Faster FADDPaul A. Clayton2010/09/25 11:58 AM
              Faster FADDEduardoS2010/09/25 12:07 PM
                Faster FADDPaul A. Clayton2010/09/28 11:29 AM
          Bulldozer article onlineMichael S2010/09/25 10:38 AM
        Bulldozer article onlineDavid Kanter2010/09/25 01:55 AM
        Bulldozer article onlineHans de Vries2010/09/26 06:58 AM
          Bulldozer article onlineDavid Kanter2010/09/26 10:38 AM
            Bulldozer article onlineHans de Vries2010/09/26 02:48 PM
              Bulldozer article onlineEduardoS2010/09/26 03:47 PM
                FP ADDs are not that fastMatt Waldhauer2010/09/30 07:46 AM
                  FP ADDs are not that fastsJ2010/09/30 11:24 AM
                    FP ADDs are not that fastEduardoS2010/09/30 01:41 PM
                      FP ADDs are not that fastrwessel2010/09/30 02:41 PM
                    FP ADDs are not that fastHans de Vries2010/09/30 07:11 PM
                  FP ADDs are not that fastEduardoS2010/09/30 01:43 PM
                    FP ADDs are not that fastMichael S2010/09/30 02:25 PM
                      FP ADDs are not that fastEduardoS2010/09/30 02:57 PM
                      FP ADDs are not that fastEric Quinnell2010/10/01 01:29 PM
                        FP ADDs are not that fastEduardoS2010/10/01 01:40 PM
                          FP ADDs are not that fastDavid Kanter2010/10/01 02:29 PM
                        Delayed post-result shift?Paul A. Clayton2010/10/01 06:14 PM
              Bulldozer article onlineDavid Kanter2010/09/26 03:50 PM
                Fusing into FMAPaul A. Clayton2010/09/27 02:01 PM
                  Fusing into FMAIan Ollmann2010/09/28 04:58 PM
                    Fusing into FMAPaul A. Clayton2010/09/29 07:25 AM
              Bulldozer article onlineMichael S2010/09/26 04:23 PM
                Bulldozer article onlineanonymous2010/09/27 01:38 AM
                  Bulldozer article onlineMichael S2010/09/27 05:00 AM
          Bulldozer article onlineEduardoS2010/09/26 10:51 AM
            Bulldozer article onlineHans de Vries2010/09/26 12:32 PM
          Bulldozer article onlineHans de Vries2010/09/26 12:28 PM
  Bulldozer article onlineMS2010/09/01 06:15 AM
  10h family 4MiB page TLB entriesPaul A. Clayton2010/09/01 06:01 PM
    10h family 4MiB page TLB entriesEduardoS2010/09/01 06:40 PM
      10h family 4MiB page TLB entriesrwessel2010/09/02 04:09 AM
        10h family 4MiB page TLB entriesEduardoS2010/09/02 03:04 PM
  Bulldozer article onlineDan Downs2010/09/02 08:28 AM
    Bulldozer article onlineslacker2010/09/02 08:37 AM
      Bulldozer article onlineDan Downs2010/09/02 10:06 AM
        Bulldozer article onlineDavid Kanter2010/09/02 04:41 PM
          Bulldozer article onlineDan Downs2010/09/02 10:42 PM
    one thread in two cores - NOHeikki Kultala2010/09/02 10:25 PM
      RF portsDavid Kanter2010/09/04 01:32 PM
  Slightly OT, but does iAtom use physical register files (PRF), too ? (NT)Alex2010/09/02 03:49 PM
    AFAIK Atom does not rename registersHeikki Kultala2010/09/03 03:46 AM
      AFAIK Atom does not rename registersAlex2010/09/03 08:41 AM
  FMACs can be ganged together, confirmed already by JF (link inside)Alex2010/09/08 04:44 PM
    FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 07:32 AM
      FMACs can be ganged together, confirmed already by JF (link inside)Triskaine2010/09/09 07:48 AM
        FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 08:11 AM
          FMACs can be ganged together, confirmed already by JF (link inside)arb2010/09/09 09:22 AM
            Bulldozer and AVXDavid Kanter2010/09/09 10:21 AM
          FMACs can be ganged together, confirmed already by JF (link inside)gruehunter2010/09/09 01:08 PM
            FMACs can be ganged together, confirmed already by JF (link inside)David Kanter2010/09/09 04:39 PM
              FMACs can be ganged together, confirmed already by JF (link inside)redpriest2010/09/09 09:56 PM
                FMACs can be ganged together, confirmed already by JF (link inside)Anthony2010/09/10 11:59 AM
                  FMACs can be ganged together, confirmed already by JF (link inside)someone2010/09/10 12:11 PM
      FMACs can be ganged together, confirmed already by JF (link inside)Alex2010/09/10 12:04 PM
    FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFFritz2010/09/22 04:37 AM
      FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFAnthony2010/09/22 11:01 AM
        FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFDavid Kanter2010/09/22 11:22 AM
          FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFanon2010/09/22 12:19 PM
            Bulldozer FPDavid Kanter2010/09/22 03:32 PM
              Bulldozer FPEduardoS2010/09/22 04:17 PM
                Bulldozer FPAaron Spink2010/09/23 12:13 PM
                  Bulldozer FPEduardoS2010/09/23 02:04 PM
                Bulldozer FPMichael S2010/09/23 01:26 PM
              Bulldozer FPanonymous2010/09/22 04:35 PM
              Bulldozer FPhobold2010/09/23 08:21 AM
                Bulldozer FPanon2010/09/23 11:59 AM
                  Bulldozer FPEduardoS2010/09/23 02:12 PM
                  Bulldozer FPhobold2010/09/23 02:19 PM
              Bulldozer FPHans de Vries2010/09/23 10:07 AM
                Bulldozer FPEric Bron2010/09/23 11:05 AM
          FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFredpriest2010/09/23 05:47 PM
      FMACs can be ganged together, confirmed already by JF (link inside) -- Oh No JFAnthony2010/09/22 11:01 AM
  Bulldozer article onlineMarcal2010/09/29 03:13 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?