ISA support for constant count loops

By: Brett (ggtgp.delete@this.yahoo.com), January 17, 2020 1:28 am
Room: Moderated Discussions
Travis (travis.downs.delete@this.gmail.com) on January 16, 2020 8:48 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 16, 2020 7:41 pm wrote:
> > Travis Downs (travis.downs.delete@this.gmail.com) on January 16, 2020 4:21 pm wrote:
> > >
> > > It's when you try to make this counter a dynamic value, e.g., writing by
> > > an instruction which takes a GP register that things get a lot messier.
> >
> > Well, there has to be some dynamic component to it - think taking a fault or exception during
> > the loop. The initial hint might be an immediate value, but at some point you have to re-enter
> > the loop in the middle, and now the loop counter has to be restored dynamically.
> >
> > That said, if it acts more like a pure hint to the instruction
> > decoder to unroll the loop into the uop cache
> > N times and/or force-initialize a (purely front-end internal) loop counter, maybe it could be a one-time
> > thing. IOW, the semantics of it would be purely as a hint
> > to the front end - it wouldn't really change semantics
> > of the code stream that follows, and if an exception happens
> > the return point simply wouldn't have that hint
> > any more (or if the hint is wrong, it would "only" result in a forced branch mispredict).
> >
> > So you wouldn't even have an architecturally visible "count" register, you'd literally have
> > just an instruction that says "the next loop will execute exactly 20 times", and it would only
> > affect the front-end hinting, not any architecturally visible state. Purely a "don't mess with
> > the branch prediction when I can tell you ahead of time for this simple thing" thing.
> >
> > (And because it's purely for the front end, it would have
> > to be an immediate as part of the instruction itself).
> >
> > I don't think there are enough small loops with small fixed
> > loop counts for this to really make any difference, though.
> >
> > Also, judging by how bad software has been at giving hints to hardware, I'm not sure it makes
> > sense for that reason either. Intel used to have prefixes on conditional branch instructions
> > to let the software say "this is likely taken/not-taken". The key word being "used to". Software
> > was so much worse at it than the hardware branch predictors that it only caused more pain.
> >
> > Linus
>
> I should re-read Paul's post again, but to be clear I was imagining exactly the "hint to the
> FE only" scenario: you'd have to duplicate the actual loop condition with normal instructions
> as usual. So it doesn't make such loops smaller in codesize, rather they get bigger.
>
> This does have the advantage that you can use these hints on loops which are not actually
> fixed iteration but are close enough to it that a fixed prediction works.
>
> I am also somewhat doubtful of the value. Intel actually used to have a loop predictor which would
> predict such loops exactly, at least after training, but they got rid of it. I don't know if they
> got rid of it because it was pointless in the face of improving general purpose predictors (it isn't
> - current predictors don't do well on fixed loops), or there was a technical constraint, but in any
> case it indicates such loops are at least not super important per Intel's simulations.

If the loop count is in the instruction stream then the loop branch prediction should be perfect.

Otherwise the loop count has to be greater than load time plus pipeline length divided by loop cycle count. Basically the size of your current OoOE window divided by the cycle count of the loop.

The old school way of looking at it two decades ago Was you had a five instruction loop and a pipeline 16 instructions long, resulting in counts loaded from memory that are greater than 3 predicted perfectly. But today the load can take 150 cycles and the OoOE window is more than that and the processor is 4-6 wide giving one cycle loops resulting in quite large count numbers needed for a perfect prediction. With the irony being most counts are small, so the predictor may deliberately pick a small number so as to fetch past the loop for speed. To a six wide processor running one 1 IPC code all that matters is prefetch, mispredicts don’t cost much as you will catch back up to the wall quickly.

This is a good place for value prediction, you could predict that the next loop count will be the same as the last, and the harm of this prediction is low.

This is also a good place for aggressive fetch ahead that is disconnected from the loop branch predictor.

Even more forward looking this is a good place to spawn another thread on a paired core to execute past the branch, betting that future fetches do not depend on the loop results, which will win most of the time. You need a twice wider OoOE window and smart memory conflict checks. This could double the speed of processors on some code, way better than the 5% we expect today.

> The main benefit over a loop predictor would be storing the prediction
> statically in the instruction stream, using almost no BP resources.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?