Branch/jump target prediction -8

By: Michael S (, August 21, 2016 12:53 am
Room: Moderated Discussions
Maynard Handley ( on August 19, 2016 1:47 pm wrote:
> Michael S ( on August 19, 2016 9:26 am wrote:
> > Maynard Handley ( on August 19, 2016 8:41 am wrote:
> > >
> > > WTF do you (Farnsworth) insist on misrepresenting what I said? What value is there in this?
> > >
> >
> > It's me, not Simon.
> >
> > > Did I talk about "extensive use of branch avoidance techniques"?
> >
> > Yes. Anything less than extensive use, wouldn't have measurable
> > effect on average density of branches in instruction stream.
> >
> > > Or did I suggest exactly what Ricardo B is talking about --- extremely short instruction sequences
> > > using csel/cmov to avoid likely unpredictable (because data-dependent) branches? Basically use csel/cmov
> > > when the point of the branch is value calculation rather than control redirection?
> > >
> >
> > Even for short unpredictable sequences branch avoidance can be sometimes counterproductive,
> > specifically, when one of the inputs is a result of long dependency chain.
> > With unpredictable branch you still have 50% chance of taking right path.
> >
> > I don't say that branch avoidance shell never be used, but that it shell be used with extreme care.
> > In specific case presented by Ricardo, I can't say for sure, but would guess
> > that it's a bad idea, because saturation tends to be predictable.
> >
> Why are you coupling together two distinct issues
> - use of csel/cmov in certain limited precise circumstances?
> - calculations at the end of a long dependency chain?

May be, because the second is an important component of "precise circumstances" of the first? And the one (but not the only) that compiler is unlikely to estimate without PGO?

> (a) I see no reason why the types of calculations I'm advocating are in some
> way particularly prone to being at the end of long dependency chains.

For algorithms like traversing binary tree or sorting by quicksort they a very likely to be on the end of load from main meamory, or at best from LLC.

> (b) Being at the end of a long dependency chain is not especially problematic if
> the result you calculate is not in turn critical. And once again I see no reason
> that the situations where I'm advocating the use of csel/cmov take that form.

That's correct. I didn't write that out in original post for sake of shortness.

As a side note, non-critical results are more common in signal/image/video processing than in "general-purpose" processing and more common in libraries than in "open" code. Both of that suggests that in ideal world they will be handled on SIMD side of CPU rather than on "genaral" side. Unfortunately, real world and ideal world are not particularly alike.

> (c) The correct solution to the generic problem of long dependency chains is
> value speculation, not replacing value calculations with control flow.

Santa, tooth fairy, value prediction... It's just me or do they have something in common?

> The point is that value prediction has the POTENTIAL to utilize a lighter weight replay mechanism
> than does control flow prediction. Control flow prediction, by definition, has to start everything
> again at the fetch point (and, unless you're using a strange ISA) doesn't have access to the
> join point at the end of a basic block, or a mechanism to know that perhaps the instructions
> after the join point don't need to be flushed. But value prediction IS amenable to lighter weight
> recovery mechanisms, if you're willing to put in the effort to do so.
> (d) Your math makes no sense. You say "well, just guess the direction
> if you don't know. you have a 50% chance of being right".
> OK, so 50% of the time the "obvious" calculation of something trivial (abs,max, maybe a test against
> saturation) costs say three instructions.

abs/max/saturation are not the cases that I was talking about, because for abs/max/saturation all the same data dependencies are present in both solutions - branchless and branchy. I was talking about cases where value of selector is independent of values from which you select your result (selectee? my Firefox claims that there is no such word :( )

> And 50% of the time it flushes the pipeline and recovery
> costs us 20 cycles times say 4 to 6 to 8 instructions each cycle (depending on the exact CPU) lost.
> Average cost is basically 40, 60, 80 instructions per max/abs/sat calculation.
> OR we could do some dicking around with csel/cmov and pay a cost of,
> depending on the details, maybe 4 to 8 instructions every time.
> The correct choice seems pretty obvious to me.
> Now maybe Ricardo's case of saturation is predictable; I have no idea, since
> I don't know what's he's doing and the data stream he is operating against.

The problem is - compiler also does not know. And still generates branless code. That's one of the points of my objection. Except that, for reasons stated above, in case of saturation it would be relatively harmless. Even if branch is predictable, branch-based alternative would be only marginally faster than branchless.

> I do know that in the use cases I cared about when I was doing this sort of thing (things
> like bit-twiddling to decode compressed data) I was dealing with maximally unpredictable data,
> and for the most part (usually, NOT always) each unpredictable branch was used to generate
> a value, not as control flow. ie exactly the situation I described in my original post.

My intuition agrees with yours - for Huffman-like decoding branchless alternative should be better. But I'd still prefer to measure the effect. Intuition sucks, even mine :-)

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Branch/jump target predictionTravis2016/08/09 09:44 AM
  Early decode of unconditional jumpsPeter Cordes2016/08/09 11:35 AM
    Early decode of unconditional jumpsExophase2016/08/09 12:29 PM
  pipelines are too long, noHeikki Kultala2016/08/09 11:37 AM
    pipelines are too long, nono name2016/08/09 06:17 PM
      pipelines are too long, noWilco2016/08/10 01:43 AM
        pipelines are too long, noPaul A. Clayton2016/08/10 07:44 PM
    Converged BTB/IcachePaul A. Clayton2016/08/10 07:44 PM
  Branch/jump target predictionsylt2016/08/10 02:27 AM
    Branch/jump target predictionPeter Cordes2016/08/12 03:23 PM
      Branch/jump target predictionsylt2016/08/12 10:35 PM
  Branch/jump target predictionMr. Camel2016/08/10 09:43 AM
    Branch/jump target predictionLinus Torvalds2016/08/10 11:46 AM
      Branch/jump target predictionMegol2016/08/10 02:25 PM
        Branch/jump target predictionLinus Torvalds2016/08/10 04:14 PM
          Branch/jump target predictionDavid Kanter2016/08/11 11:09 PM
            Branch/jump target predictionLinus Torvalds2016/08/12 11:25 AM
          Branch/jump target prediction2016/08/14 04:24 AM
            Branch/jump target predictionMaynard Handley2016/08/14 06:47 AM
              Branch/jump target predictionDavid Kanter2016/08/14 07:13 AM
              Branch/jump target prediction2016/08/16 05:19 AM
            Branch/jump target predictionTim McCaffrey2016/08/14 07:12 AM
              Branch/jump target predictionDavid Kanter2016/08/14 07:18 AM
                Branch/jump target predictionGabriele Svelto2016/08/14 01:09 PM
            Just a thoughtAnon2016/08/14 09:40 AM
              Just a thought2016/08/16 05:58 AM
                Just a thoughtAnon2016/08/16 07:45 AM
                  Just a thought2016/08/16 08:36 AM
            Branch/jump target predictionLinus Torvalds2016/08/14 09:40 AM
              Branch/jump target prediction2016/08/16 05:40 AM
                Branch/jump target predictionRicardo B2016/08/16 06:39 AM
                  Branch/jump target prediction -82016/08/16 08:23 AM
                    Branch/jump target prediction -8anon2016/08/16 09:09 AM
                    Branch/jump target prediction -8Ricardo B2016/08/16 09:33 AM
                      Branch/jump target prediction -8Exophase2016/08/16 10:02 AM
                        Branch/jump target prediction -8Ricardo B2016/08/16 10:31 AM
                        SPU hbr instruction (hint for branch)vvid2016/08/16 11:31 AM
                        Branch/jump target prediction -8no name2016/08/17 07:16 AM
                    Branch/jump target prediction -8Gabriele Svelto2016/08/16 10:46 AM
                      Branch/jump target prediction -8Etienne2016/08/17 12:27 AM
                        Branch/jump target prediction -8Gabriele Svelto2016/08/17 02:52 AM
                    Branch/jump target prediction -8Maynard Handley2016/08/18 09:02 AM
                      Branch/jump target prediction -82016/08/18 05:21 PM
                        Branch/jump target prediction -8Maynard Handley2016/08/18 06:27 PM
                          Branch/jump target prediction -8Megol2016/08/19 03:29 AM
                          Part 1/N - CPU-internal JIT2016/08/19 03:44 AM
                        Atom, you're such a comedian.Jim Trent2016/08/18 09:39 PM
                          Atom, you're such a comedian.2016/08/19 02:23 AM
                      Branch/jump target prediction -8Etienne2016/08/19 12:25 AM
                        Branch/jump target prediction -8Simon Farnsworth2016/08/19 03:17 AM
                          Branch/jump target prediction -8Michael S2016/08/19 05:39 AM
                          Branch/jump target prediction -8anon2016/08/19 06:29 AM
                            Branch/jump target prediction -8Simon Farnsworth2016/08/19 07:34 AM
                              Branch/jump target prediction -8anon2016/08/19 07:48 AM
                                Branch/jump target prediction -8Exophase2016/08/19 10:03 AM
                                Branch/jump target prediction -8Maynard Handley2016/08/19 10:34 AM
                            Branch/jump target prediction -8David Kanter2016/08/19 11:23 PM
                        Branch/jump target prediction -8Ricardo B2016/08/19 06:18 AM
                          Branch/jump target prediction -8Maynard Handley2016/08/19 07:41 AM
                            Branch/jump target prediction -8Michael S2016/08/19 08:26 AM
                              Branch/jump target prediction -8Maynard Handley2016/08/19 12:47 PM
                                Branch/jump target prediction -8Michael S2016/08/21 12:53 AM
                                  Branch/jump target prediction -8Ricardo B2016/08/22 04:17 AM
                                    Branch/jump target prediction -8Michael S2016/08/22 04:58 AM
                                      Branch/jump target prediction -8Ricardo B2016/08/22 06:50 AM
                            Branch/jump target prediction -8Simon Farnsworth2016/08/19 08:28 AM
                              Branch/jump target prediction -8Simon Farnsworth2016/08/19 08:40 AM
                            Branch/jump target prediction -8David Kanter2016/08/22 11:05 PM
                              Branch/jump target prediction -8Maynard Handley2016/08/23 06:49 AM
                      Branch/jump target prediction -8anon2016/08/26 07:00 AM
                        Branch/jump target prediction -8anon2016/08/26 07:14 AM
                Branch/jump target predictionMegol2016/08/19 03:23 AM
          Branch/jump target predictionMegol2016/08/19 06:42 AM
            Branch/jump target predictionMaynard Handley2016/08/19 10:46 AM
              Branch/jump target predictionDavid Kanter2016/08/19 11:34 PM
                Branch/jump target predictionMaynard Handley2016/08/20 06:07 AM
            Branch/jump target predictionsylt2016/08/19 10:48 AM
              Branch/jump target predictionsylt2016/08/19 11:00 AM
              Branch/jump target predictionMegol2016/08/21 09:27 AM
                The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/22 02:10 PM
                  The (apparent) state of trace caches on modern CPUsExophase2016/08/22 07:55 PM
                    The (apparent) state of trace caches on modern CPUsanon2016/08/22 11:36 PM
                      The (apparent) state of trace caches on modern CPUsExophase2016/08/23 04:08 AM
                        The (apparent) state of trace caches on modern CPUsanon2016/08/23 08:51 PM
                          The (apparent) state of trace caches on modern CPUsExophase2016/08/23 10:12 PM
                          The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/24 06:38 AM
                            The (apparent) state of trace caches on modern CPUsanon2016/08/24 07:26 PM
                    The (apparent) state of trace caches on modern CPUsMaynard Handley2016/08/23 06:48 AM
                      That's not trueDavid Kanter2016/08/23 08:39 AM
                        That's not trueMaynard Handley2016/08/23 08:56 AM
                      The (apparent) state of trace caches on modern CPUsanon2016/08/23 08:54 PM
                  The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 01:38 AM
                    The (wrong) state of trace caches on modern CPUsMichael S2016/08/25 02:28 AM
                      The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 06:12 AM
                      The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 08:50 AM
                        The (wrong) state of trace caches on modern CPUsMichael S2016/08/25 09:36 AM
                          The (wrong) state of trace caches on modern CPUsExophase2016/08/25 10:32 AM
                        The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 10:12 AM
                          The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 11:01 AM
                            The (wrong) state of trace caches on modern CPUsEric Bron2016/08/25 11:20 AM
                              The (wrong) state of trace caches on modern CPUsMaynard Handley2016/08/25 12:34 PM
        Branch/jump target predictionGabriele Svelto2016/08/11 12:15 PM
  Branch/jump target predictionGabriele Svelto2016/08/20 06:21 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊