ISA support for constant count loops: ineffective compared to micro-threads

By: (0xe2.0x9a.0x9b.delete@this.gmail.com), January 20, 2020 8:02 am
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 16, 2020 10:28 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 16, 2020 9:12 am wrote:
> [snip]
> > Which was definitely a huge loss. Yes, the direct copies were larger, but because it was a constant
> > size, there were no mispredict issues or "jump out-of-line" I$ behavior issues with the direct
> > copies. So rep movs lost by a mile, and it really stuck out like a sore thumb on profiles.
>
> In theory an ISA could provide a simple mechanism to load the loop count for constant iteration (inner) loops.
> A simple load constant instruction could trigger microarchitectural optimization of branch prediction (and
> other functions). For constant count loops there is no reason for branch misprediction other than difficulty
> of recognizing the count and the associated branch. Even much of the overhead of the branch instruction could
> be reduced (like some DSPs' special support for loops). For x86, this could just be idiom recognition for REP-prefixed
> instructions; loading a constant into the count register would prepare for such a loop.
>
> For small copies, individual stores (full unrolling for general loops) might still be more efficient.
> This factor then discourages hardware overhead for low overhead small count loops.

Well, but why to discuss this trivial form of the interaction of a loop counter with the branch predictor? The expected speedup is tiny.

Why not aim for something much bigger with the potential of improving performance by a non-trivial margin?

For example: Add to x86 CPUs instructions to start in a single thread 16 iterations of a particular loop in parallel with hardware support for data dependency resolution across the 16 iterations in order to make those 16 iterations look like they executed sequentially. Some instruction bits being able to tell the hardware which loop instructions are safe to be executed concurrently without requiring dynamic data dependency checks. I hope we can all here agree that this would improve performance by a hundred percent on loops whose iterations are so small and/or are so inter-dependent that they are impossible to be sped up by sending them to another x86 thread. Of course this would require multiple CPU generations and decades of years to fully unfold because a reasonably optimized implementation of this idea requires many more transistors than are available today.

In the future, x86 CPU cores will be stacked on top of each other. The difference between single-core clock frequency and many-core clock frequency will be larger than today.

Codes containing totally unpredictable branches where the branch targets form a hard-to-crack pseudo-random sequence of numbers are for eternity doomed for sequential execution because any parallel version of the code is less efficient than the sequential version.

-atom
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?