ISA support for constant count loops: ineffective compared to micro-threads

By: (0xe2.0x9a.0x9b.delete@this.gmail.com), January 22, 2020 3:28 pm
Room: Moderated Discussions
Heikki Kultala (heikki.kultala.delete@this.tuni.fi) on January 22, 2020 9:47 am wrote:
> ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on January 22, 2020 8:18 am wrote:
>
> > > Note that with Out Of Order processor you may already treat few iteration in parallel right now...
> >
> > Only if you can statically (at compile-time) prove that the few iterations are data independent.
>
> No.
>
> OoOE works fully dynamically. The processor fetches multiple iterations of the loop (later
> iterations of the micro-op cache) and instructions from ANY iteration can execute.
>
> There is no need to prove ANYTHING at compile time; The incoming code looks like completely serial
> code but the processor extracts huge amount of parallelism from it fully dynamically.
>
> If the processor makes some wrong guess like branch misprediction or incorrectly predicted
> memory aliasing, then it just reverts the "last known correct state".

I think we mutually agree that software can for example choose the 1st instruction (X1) from the 1st 4 iterations (I1-I4) of the loop, which can be written as tuple (I1.X1, I2.X1, I3.X1, I4.X1), and thus can expect a 4-wide CPU to begin executing those 1st 4 loop iterations in parallel. This can continue with the 2nd instruction (X2), 3rd, and so on, until a branch instruction is encountered that will cause the 4 iterations to diverge.

The tuple can only be constructed if the 4 incarnations of X1 can at compile-time be proved to be data independent (i.e: I2.X1 does not depend on the whole I1; I3.X1 does not depend on the whole I1 and the whole I2, etc).

An issue with the approach in the above paragraph is that the number of registers available to the compiler is divided by about 4.

By definition (by necessity) the last instruction in any kind of loop is a conditional jump instruction. Current OoO [x86] CPUs are incapable of concurrently executing multiple conditional jump instructions on a single core - the single-threaded nature of the instruction stream is too deeply rooted in the basic design of any OoO [x86] CPU core.

Current processors cannot fetch multiple iterations of the loop (from L1 or from µop cache) because the CPU can only see at most 4 instructions per clock cycle (5 in case of a fused instruction, 6 in case the source is µop cache), and can resolve at most 1 conditional jump instruction per cycle.

Obviously, parallel execution of 16 loop iterations in parallel would require concurrent execution of up to 16 conditional jump instructions per cycle - which sharply contradicts the fact that current x86 OoO CPUs can execute at most 1 such instruction per cycle.

The idea I proposed enables a CPU to start 16 loop iterations in parallel, with the rather obvious requirement that the CPU implementation should be able to resolve up to 16 conditional instructions in parallel, and with the rather obvious benefit that the number of registers available to the compiler isn't divided by about 16.

It is different from multi-core because there aren't 16 copies of L1I/L1D caches - the L1I/L1D caches "just" need to be multi-ported.

In practice, even with 16 iterations started, we would be lucky if the real-world speedup factor is 2 (so just a 100% speedup, not 1600%) - which is however still much larger than the speedup which can be expected from "ISA support for constant count loops" that started this discussion.

-atom
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
LLVM comments on mem*Maynard Handley2020/01/14 01:51 PM
  LLVM comments on mem*Anon32020/01/15 06:28 AM
  Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/15 07:12 AM
    Interesting comment about rep instructions & code sizenone2020/01/15 08:59 AM
      Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:56 AM
        Interesting comment about rep instructions & code sizeLinus Torvalds2020/01/16 10:12 AM
          ISA support for constant count loopsPaul A. Clayton2020/01/16 11:28 AM
            ISA support for constant count loopsGabriele Svelto2020/01/16 02:15 PM
              PowerPC "front-end registers"Paul A. Clayton2020/01/16 03:34 PM
              ISA support for constant count loopsTravis Downs2020/01/16 05:21 PM
                ISA support for constant count loopsLinus Torvalds2020/01/16 08:41 PM
                  ISA support for constant count loopsTravis2020/01/16 09:48 PM
                    ISA support for constant count loopsBrett2020/01/17 01:28 AM
              Branch to CTRMaya2020/01/18 08:15 AM
                Branch to CTRGabriele Svelto2020/01/18 01:14 PM
            ISA support for constant count loopsanon2020/01/17 08:28 AM
              ISA support for constant count loopsTravis Downs2020/01/17 08:34 AM
            ISA support for constant count loops: ineffective compared to micro-threads2020/01/20 08:02 AM
              ISA support for constant count loops: ineffective compared to micro-threadssomeone2020/01/20 12:23 PM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:23 AM
              ISA support for too slow computersEtienne2020/01/21 02:42 AM
                ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 09:18 AM
                  ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 10:04 AM
                  ISA support for constant count loops: ineffective compared to micro-threadsHeikki Kultala2020/01/22 10:47 AM
                    ISA support for constant count loops: ineffective compared to micro-threadsdmcq2020/01/22 01:31 PM
                    ISA support for constant count loops: ineffective compared to micro-threads2020/01/22 03:28 PM
                      ISA support for constant count loops: ineffective compared to micro-threadsEtienne2020/01/22 04:35 PM
          Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 02:00 PM
    Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 03:40 PM
      Interesting comment about rep instructions & code sizeChester2020/01/15 05:16 PM
        Interesting comment about rep instructions & code sizeTravis Downs2020/01/15 05:50 PM
          Interesting comment about rep instructions & code sizeChester2020/01/15 07:24 PM
            Interesting comment about rep instructions & code sizeTravis Downs2020/01/16 02:26 PM
              Interesting comment about rep instructions & code sizeChester2020/01/17 01:16 PM
                Interesting comment about rep instructions & code sizeTravis Downs2020/01/17 03:41 PM
        Interesting comment about rep instructions & code sizeGabriele Svelto2020/01/16 03:53 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?