Cracking is not free

By: Travis Downs (, December 25, 2018 4:41 pm
Room: Moderated Discussions
Wilco ( on December 22, 2018 10:24 am wrote:
> > The result of the load doesn't need to participate in renaming
> > in the same way because it is not architecturally
> > visible and doesn't persist beyond the instruction. It just needs to get from the load to the single op
> > that will ever consume it, which is a different problem and likely off the critical path.
> You could special case it since no other instruction needs to read the renamed register.
> But other than that it's like any other destination that needs to be renamed.

I don't agree. In fact, it occurs to me that you don't even need the temporary register: both halves of the operation can use the same physical register: the load goes into the register, and the operation executes against it and updates it. There are no intermediate operations so nothing needs to see the state after the load but before the op.

However it is implemented, it is fairly clear to me that in practice on x86 (and I suspect other archs as well although I am less familiar) the lack of a second output register lets these pairs be renamed efficiently as a unit.

The alternative view is that somehow the renamer is capable of renaming two-destination operations, but only in the special case of fused load ops and they somehow don't expand that functionality to other two-destination instructions. That seems ... unlikely.

> As is the flags register.

Yes, flags have to be renamed, but does it matter here?

In any case, one strategy to rename flags, which I believe is used on modern Intel x86, is simply to extend each physical register to hold the set of flags bits in addition to the reg data, so any instruction which has a destination register (almost all of them) uses the same physical register to hold the flags which result from the operation as well. The renamer tracks in-order which flags map to which physical register (it's often more than one on x86 because of instructions which write to a subset of flags), and adds as input the correct reg(s) for any instruction which consumes flags.

So using that method supporting flags isn't much harder than regular renaming on the write side (since you are already generally allocating a physical register for the destination), and acts like another input on the read side.

> All instructions must be renamed either way. Early cracking means more micro ops to
> rename, so throughput is simply lower. Now you could widen rename, but that has a
> high cost. A cheaper approach is to add support for a 2nd destination register.

I'm not really following. The cost to rename isn't counted in instructions, I don't think, it's largely counted in terms of operations, input registers and output registers. You can't get around renaming limits simply by fusing 10 instructions into one big operation with 10 outputs and then pretend your renamer treats that as a single instruction!

Said another way, if you can build a 6-wide single destination renamer, I don't think that implies you can "easily" build a 6-wide double destination renamer on the same technology with the same people. Maybe you can build a 3, 4 or 5 wide one though. How that all pans out depends on fraction of actually fused operations: if you have an instruction set that inherently has lots of double destination instructions (e.g., auto-increment) then maybe you favor narrower crack-at-rename. If the majority of instructions are single destination (see x86), maybe you favor a wider single-destination renamer and early cracking.

Out of curiosity, any idea what Apple is doing in the A-series?

> > Note that I agree with you that rename bandwidth is saved
> > in the case of micro-fused/uncracked-until-after
> > rename ops like load-op on x86 or call/store on ARM. It's only for 2-output
> > ops like auto-increment this doesn't apply.
> A modern Arm core can execute 2 loads with auto-increment every cycle using just 2 rename slots.
> The other slots are still free for other instructions, so yes it saves rename bandwidth.

As above, I don't think this is how it works. It's not an apples to apples comparison to compare a late-cracking renamer capable of handling N multi-destination instructions, with an early cracking one. The latter is simpler so can be wider at the same design point. I'm not saying one is inherently better than the other, just that it's not obvious that you can cram a ton of complexity into one op and then build a renamer that supports this efficiently.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
RISC-V Summit ProceedingsGabriele Svelto2018/12/19 09:36 AM
  RISC-V gut feelingsKonrad Schwarz2018/12/20 05:30 AM
    RISC-V inferior to ARMv8Heikki Kultala2018/12/20 08:36 AM
      RISC-V inferior to ARMv8Wilco2018/12/20 02:31 PM
        RISC-V inferior to ARMv8Travis Downs2018/12/20 03:18 PM
          RISC-V inferior to ARMv8Wilco2018/12/21 04:43 AM
            RISC-V inferior to ARMv8Ronald Maas2018/12/21 10:35 AM
          RISC-V inferior to ARMv8juanrga2018/12/21 11:28 AM
            RISC-V inferior to ARMv8Maynard Handley2018/12/21 03:39 PM
              RISC-V inferior to ARMv8anon2018/12/21 04:38 PM
                RISC-V inferior to ARMv8juanrga2018/12/23 05:39 AM
                  With similar logic nor do frequency (NT)Megol2018/12/23 10:45 AM
              RISC-V inferior to ARMv8juanrga2018/12/23 05:44 AM
                RISC-V inferior to ARMv8Wilco2018/12/23 07:21 AM
      RISC-V inferior to ARMv8Michael S2018/12/20 04:24 PM
        RISC-V inferior to ARMv8anon2018/12/20 05:22 PM
          RISC-V inferior to ARMv8Travis Downs2018/12/21 07:16 PM
            RISC-V inferior to ARMv8anon2018/12/22 04:53 AM
              Execution runtimes and SpectreFoo_2018/12/22 07:02 AM
        RISC-V inferior to ARMv8Adrian2018/12/20 09:51 PM
          RISC-V inferior to ARMv8Doug S2018/12/21 12:10 AM
            RISC-V inferior to ARMv8Adrian2018/12/21 12:38 AM
              RISC-V inferior to ARMv8Michael S2018/12/21 03:31 AM
                RISC-V inferior to ARMv8Adrian2018/12/21 04:23 AM
            RISC-V inferior to ARMv8random person2018/12/21 03:04 AM
              RISC-V inferior to ARMv8dmcq2018/12/21 05:27 AM
              RISC-V inferior to ARMv8juanrga2018/12/21 11:36 AM
              RISC-V inferior to ARMv8Doug S2018/12/21 01:02 PM
            RISC-V inferior to ARMv8juanrga2018/12/21 11:23 AM
          RISC-V inferior to ARMv8Adrian2018/12/21 12:21 AM
          RISC-V inferior to ARMv8anon2018/12/21 02:48 AM
            RISC-V inferior to ARMv8Adrian2018/12/21 04:44 AM
              RISC-V inferior to ARMv8anon2018/12/21 06:24 AM
            RISC-V inferior to ARMv8Adrian2018/12/21 05:09 AM
              RISC-V inferior to ARMv8Wilco2018/12/21 05:28 AM
          RISC-V inferior to ARMv8Michael S2018/12/21 03:27 AM
            RISC-V inferior to ARMv8Gabriele Svelto2018/12/21 02:09 PM
              RISC-V inferior to ARMv8Maynard Handley2018/12/21 03:58 PM
              RISC-V inferior to ARMv8Wilco2018/12/21 04:43 PM
                RISC-V inferior to ARMv8Travis Downs2018/12/21 06:45 PM
                  RISC-V inferior to ARMv8Wilco2018/12/22 05:37 AM
                    RISC-V inferior to ARMv8Travis Downs2018/12/22 07:54 AM
                      RISC-V inferior to ARMv8Wilco2018/12/22 11:32 AM
                Cracking is not freeGabriele Svelto2018/12/22 03:09 AM
                  Cracking is not freeWilco2018/12/22 05:32 AM
                    Cracking is not freeTravis Downs2018/12/22 08:07 AM
                      Cracking is not freeWilco2018/12/22 08:38 AM
                        Cracking is not freeTravis Downs2018/12/22 08:47 AM
                          Cracking is not freeWilco2018/12/22 11:24 AM
                            Cracking is not freeTravis Downs2018/12/25 04:41 PM
                              Cracking is not freeanon.12018/12/25 09:14 PM
                        multi-instruction decode and renamePaul A. Clayton2018/12/22 07:45 PM
                    Cracking is not freeGabriele Svelto2018/12/22 01:30 PM
                      Cracking is not freeWilco2018/12/23 07:48 AM
                      Cracking is not freeMichael S2018/12/23 09:09 AM
                        Cracking is not freeGabriele Svelto2018/12/26 03:53 PM
          RISC-V inferior to ARMv8rwessel2018/12/21 02:13 PM
          RISC-V inferior to ARMv8Seni2018/12/21 03:33 PM
            RISC-V inferior to ARMv8Wilco2018/12/21 04:33 PM
              RISC-V inferior to ARMv8Travis Downs2018/12/21 06:49 PM
                RISC-V inferior to ARMv8Wilco2018/12/22 05:58 AM
                  RISC-V inferior to ARMv8Travis Downs2018/12/22 08:03 AM
                    RISC-V inferior to ARMv8Wilco2018/12/22 08:22 AM
                      RISC-V inferior to ARMv8Travis Downs2018/12/22 08:40 AM
        RISC-V inferior to ARMv8dmcq2018/12/21 04:57 AM
      RISC-V inferior to ARMv8Konrad Schwarz2018/12/21 03:25 AM
      RISC-V inferior to ARMv8j2018/12/21 11:46 AM
        RISC-V inferior to ARMv8Travis Downs2018/12/21 07:08 PM
          RISC-V inferior to ARMv8dmcq2018/12/22 08:45 AM
            RISC-V inferior to ARMv8Travis Downs2018/12/22 08:50 AM
              RISC-V inferior to ARMv8Michael S2018/12/22 09:15 AM
                RISC-V inferior to ARMv8dmcq2018/12/22 11:41 AM
        RISC-V inferior to ARMv8AnonQ2018/12/22 09:13 AM
    RISC-V gut feelingsdmcq2018/12/20 08:41 AM
      RISC-V initial takeKonrad Schwarz2018/12/21 03:17 AM
        RISC-V initial takedmcq2018/12/21 04:23 AM
      RISC-V gut feelingsMontaray Jack2018/12/22 03:56 PM
        RISC-V gut feelingsdmcq2018/12/23 05:38 AM
  RISC-V Summit Proceedingsjuanrga2018/12/21 11:47 AM
    RISC-V Summit Proceedingsdmcq2018/12/22 07:21 AM
      RISC-V Summit ProceedingsMontaray Jack2018/12/22 03:03 PM
        RISC-V Summit Proceedingsdmcq2018/12/23 05:39 AM
  RISC-V Summit Proceedingsanon22018/12/21 11:57 AM
    RISC-V Summit ProceedingsMichael S2018/12/22 09:36 AM
      RISC-V Summit ProceedingsAnon2018/12/22 06:51 PM
      Not Stanford MIPS but commercial MIPSPaul A. Clayton2018/12/23 04:05 AM
        Not Stanford MIPS but commercial MIPSMichael S2018/12/23 04:49 AM
        Not Stanford MIPS but commercial MIPSdmcq2018/12/23 05:52 AM
Reply to this Topic
Body: No Text
How do you spell avocado?