Sunny Cove wide

By: Travis Downs (travis.downs.delete@this.gmail.com), December 13, 2018 2:22 pm
Room: Moderated Discussions
anon (spam.delete.delete.delete@this.this.this.spam.com) on December 13, 2018 1:01 pm wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on December 13, 2018 11:23 am wrote:
> > Maynard Handley (name99.delete@this.name99.org) on December 13, 2018 10:25 am wrote:
> > >
> > > What is not being aggressively pushed (as far as I know) is the sort
> > > of fusion that rewrites an intermediate register, something like
> > > rA= rB op1 RC
> > > rA= rD op2 rA going to
> > > rA= rD op2 (rB op1 rC)
> > > The point of this exercise is that you only have to perform the rA allocation
> > > once, so when it happens you get more throughout through your renamer.
> >
> > Indeed, that's the type of possible fusion that I'm aware of that wouldn't be crazy to implement.
> > Other things like fusion of non-continguous instructions raise all sorts of other problems such
> > as how interrupts see the world, how faults in the intervening instructions work, etc.
> >
> > On scalar x86, with 2-operand rather than 3-operand ops, you also have a lot of this:
> >
> >
> > mov r1, r2
> > op r1, r3
> >

> >
> > That is, a move to a register followed immediately by an operation
> > overwriting that register. This could potentially be fused to:
> >
> >
> > op r1, r2, r3
> >

> >
> > Since we know there is internal support for 3-operand uops (and indeed there
> > are some 3-operand scalar instructions, such as lea and andn).
> >
>
> This seems like it would be a nightmare to implement.
> The usual way would be that mov-elimination removes the mov and the uop that leaves the renamer is effectively
> "op RAT[r1], RAT[r2], RAT[r3]" anyway. I hope you understand what I mean with that notation.

Yes, but the problem is the eliminated mov still takes RAT bandwidth, which is often a critical bottleneck since almost everything else is wider. Agreed though that mov-elimination makes this type of fusion less important than it might otherwise be.


> Macro fusion usually works with flags where only the type of operations matters. Implicit operands like
> flags and with push/pop are dealt with in the frontend iirc so all the information is available.

Flags have to be are renamed as well, and I think they are simply associated with the destination register of the instruction that produced them. E.g, you can think of every physical register as extended with flag values which are populated by the same op that wrote the register.

In any case, I don't think any of that matters for macro-fusion: for existing fusion the decoders just have to match the simple pattern of compatible ALU op followed by compatible jump and emit the fused op. So it's a very simple pattern matching. The flags don't really come into it.

> Fusing mov and a different op would require the input and output registers of the second op to
> be compared with the output register of the mov and then if the output register matches the matching
> input register would have to be overwritten with the input register from the mov.

Yes, the comparison needs to compare the (static) registers in the instructions, but this seems not at all difficult and then it just needs to emit the fused op. I'm not sure what you mean by "overwrite" - if the the two instructions match the pattern the fused op is just emitted whole-cloth.

I am assuming this all happens at decode. Are you thinking it happens at rename?

>
> Then there's also the problem of actually getting that code. Compilers are too smart, if
> they expect the rename width to be a problem the might try to move the mov elsewhere.

Here, I strongly disagree. First, I have never seen compilers determine there is "rename width" problems and do something proactive to fix it. Almost everything is a "feed-forward" (not feed-back) and based on heuristics. I think compilers are working a couple of levels less smartly than that.

Second, you can't really fix "rename width" by moving code around. What matters is loops, and in loops rename bandwidth is basically a global property of the loop, not of a few instructions. The loop either hits the rename bandwidth or not, and re-arrangement can't fix it. At least it mostly works like that on deep OoO machines for loops that aren't giant (if they are giant, like larger than the OoO window, maybe this type of movement can help if you can move a large distance).

> If
> the target architecture does not support mov-elimination then the mov would probably be
> hoisted to avoid stalling the subsequent instruction or you'd get something like
> mov r1, r2
> op r2, r3
> instead.

This is true: compilers will already prefer the above form when possible. So it would be something they would need to learn/tune for. It was no different that existing macro-fusion: when it was introduced compilers almost always separated the flag-producing op and the jump if they could, but now they put them adjacent as long as you tune for newer CPUs.


> The appeal of fusing flag-dependant operations is just far greater on x86 because all operations that can
> affect flags always do, so consumer and producer are virtually guaranteed to be consecutive instructions.

Yup, but it's already done! As above though, compilers often managed to separate the set and the jump - just trying compiling any code on gcc without -march or -mtune set: there is almost always a load, lea or store the compiler can shove in between. They had to learn to play nice with macro-fusion.


< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Sunny Cove wideSeni2018/12/12 02:58 PM
  Sunny Cove wideTravis Downs2018/12/12 09:25 PM
    Sunny Cove wideJeff S.2018/12/12 10:26 PM
      Sunny Cove wideTravis Downs2018/12/13 08:42 AM
        Sunny Cove wideanon2018/12/13 09:09 AM
          Sunny Cove wideTravis Downs2018/12/13 09:30 AM
            Sunny Cove wideJeff S.2018/12/13 09:40 AM
              Sunny Cove wideTravis Downs2018/12/13 09:55 AM
                Sunny Cove wideJeff S.2018/12/13 12:41 PM
                  Sunny Cove wideTravis Downs2018/12/13 02:03 PM
                  Non-power-of-two set sizesPaul A. Clayton2018/12/15 07:30 PM
        Sunny Cove wideJeff S.2018/12/13 09:33 AM
          Sunny Cove wideTravis Downs2018/12/13 09:50 AM
          What is "u-tagged"?G Adair2018/12/13 09:54 PM
            What is "u-tagged"?Travis Downs2018/12/13 11:22 PM
            What is "u-tagged"?Jeff S.2018/12/14 08:48 AM
              What is "u-tagged"?anon2018/12/14 08:51 PM
                What is "u-tagged"?Jeff S.2018/12/14 10:23 PM
                  What is "u-tagged"?anon2018/12/15 05:37 AM
                    What is "u-tagged"?anon2018/12/15 08:06 AM
                      What is "u-tagged"?Travis Downs2018/12/15 09:52 AM
                        What is "u-tagged"?anon2018/12/16 08:26 AM
                          What is "u-tagged"?Anon2018/12/18 04:25 AM
    Sunny Cove wideSeni2018/12/13 03:33 AM
      Sunny Cove wideKevin G2018/12/13 08:37 AM
        Sunny Cove wideTravis Downs2018/12/13 09:17 AM
          Sunny Cove wideKevin G2018/12/17 10:09 AM
            Sunny Cove wideTravis Downs2018/12/18 03:14 PM
              Sunny Cove wideKevin G2018/12/19 12:02 PM
      Sunny Cove wideTravis Downs2018/12/13 08:51 AM
        Sunny Cove wideMaynard Handley2018/12/13 11:25 AM
          Sunny Cove wideTravis Downs2018/12/13 12:23 PM
            Sunny Cove wideanon2018/12/13 02:01 PM
              Sunny Cove wideTravis Downs2018/12/13 02:22 PM
                Sunny Cove wideanon2018/12/13 04:51 PM
                  Sunny Cove wideTravis Downs2018/12/13 05:36 PM
                    Sunny Cove wideanon2018/12/14 03:57 AM
                      Sunny Cove wideLinus Torvalds2018/12/14 01:54 PM
                        Sunny Cove wideanon2018/12/14 04:25 PM
                          Sunny Cove wideLinus Torvalds2018/12/14 06:46 PM
                            Sunny Cove wideanon2018/12/15 02:57 AM
                              Sunny Cove wideanon2018/12/15 05:59 AM
                                Sunny Cove wideanon2018/12/15 06:59 AM
                                  Sunny Cove wideanon2018/12/15 07:03 AM
                                Sunny Cove widea_different_anon2018/12/15 07:45 AM
                              Sunny Cove wideSeni2018/12/15 06:25 AM
                                Sunny Cove wideanon2018/12/15 07:02 AM
                                  Sunny Cove wideLinus Torvalds2018/12/15 10:52 AM
                                    Sunny Cove wideanon2018/12/15 11:13 AM
                                      Sunny Cove wideTravis Downs2018/12/16 11:15 AM
                                      Sunny Cove wideanon2018/12/17 12:42 AM
                                        how many anons here? (NT)Michael S2018/12/17 02:46 AM
                      Sunny Cove wideTravis Downs2018/12/15 10:08 AM
                        Sunny Cove wideanon2018/12/15 10:55 AM
                          Sunny Cove wideTravis Downs2018/12/16 09:19 AM
                            Sunny Cove wideanon2018/12/16 10:37 AM
                              Sunny Cove wideTravis Downs2018/12/16 10:57 AM
                                Sunny Cove wideanon2018/12/16 12:04 PM
                                  Sunny Cove wideTravis Downs2018/12/16 07:51 PM
                          Sunny Cove wideTravis Downs2018/12/16 11:32 AM
        Sunny Cove wideSeni2018/12/13 04:20 PM
          Fair enough! (NT)Travis Downs2018/12/13 04:43 PM
  Sunny Cove wide-.-2018/12/13 04:37 AM
    Sunny Cove wideanon2018/12/13 09:06 AM
      Sunny Cove wideTravis Downs2018/12/13 09:39 AM
        Sunny Cove wideanon2018/12/13 12:09 PM
          Sunny Cove wideTravis Downs2018/12/13 12:27 PM
            Sunny Cove wideanon2018/12/13 01:11 PM
    Sunny Cove wideTravis Downs2018/12/13 09:23 AM
      Sunny Cove wideanonymous22018/12/13 03:20 PM
        Sunny Cove wideTravis Downs2018/12/13 05:00 PM
          Sunny Cove wideanon³2018/12/13 10:34 PM
            Sunny Cove wideTravis Downs2018/12/16 07:53 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?