Speculative multi-threading

By: Maynard Handley (name99.delete@this.name99.org), January 3, 2014 5:01 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on January 3, 2014 2:12 am wrote:
> Maynard Handley (name99.delete@this.name99.org) on January 3, 2014 12:06 am wrote:
> > Maynard Handley (name99.delete@this.redheron9.com) on December 31, 2013 5:26 pm wrote:
> >
> > > Summary: I think it's true (and mostly agreed) that SW prefetching is dead. What's not widely known is
> > > the extent of interesting HW replacements, or the extent to which any of these are yet implemented.
> >
> > In the context of what I wrote earlier, there seems to be an emerging consensus around future
> > architectures that takes the form of slipping long-latency instructions (and their chains of
> > dependent instructions, possibly hundreds to thousands of instructions long) aside to get at
> > the independent instructions that can run. The details are now only in exactly how this is done
> > so as to replace the ROB with as low power, low area, and low complexity as possible.
> > Along these lines we have FlowForward, CPR (Checkpoint Processing
> > and Recovery) and its successor/amplification
> > CFP (Continual Flow Processing) and now I see DOE, Disjoint Out-of-Order Execution
> > http://j92a21b.ee.ncku.edu.tw/broad/report100/2012-12-24/Disjoint%20OOO%20Execution%20Proc%2012.pdf
> >
> > Looked at from a thousand miles up, this last one, in particular,
> > sounds somewhat like Apple's infamous MacroScalar
> > stuff I mentioned in my last post... The details and the concentration differs, sure, but the abstract view
> > seems, in all these cases, to be to create, on the fly, long long LONG chains of instructions such that all
> > instructions in a chain are dependent, but the various chains
> > are independent of each other (except to the extent
> > that they fork from the occasional starting point, and join
> > again at join point). Once you have these chains,
> > it's a somewhat orthogonal question whether you run them
> > on the same "CPU" (CFP, FlowForward), on kinda sorta
> > but not quite the same CPU (MacroScalar), or different CPUs (kinda sorta the DOE stuff).
> >
> > Does anyone have an opinion on how real this stuff is? The CPR/CFP/DOE chain of ideas is
> > based on people at Intel, but that obviously doesn't mean Intel are ready to bet on it.
> > Part of me thinks it would take working silicon from a university (kinda like
> > SPARC/MIPS in the 80s) to really validate the idea and make it worth swapping
> > in for the tried and trusted OoO ROB engines that everyone uses today.
> > And part of me hopes that Apple, as the one player in this space that has not been burned by over-ambition
> > on the CPU front, might just be audacious enough to look
> > at these numbers ("hmm, we can get a CPU that's about
> > 50% faster than our existing A7, in smaller area and lower
> > power, and that will scale well to higher frequencies.
> > Hell, let's take on Intel and ARM head-on and go into the business of selling CPUs to everyone")
> > Though I'd be just as happy if nVidia or Qualcomm or AMD were desperate enough to make
> > a splash that they took these ideas and ran with them. I've been looking at a bunch of
> > ideas for how to handle memory latency, and this collection seem the most promising.
> >
> > What worries me is that the gap between our optimized OoO engines today and the retooling you'd need for
> > these alternative ideas is so large that it's not easy to get from here to there. I THINK you could do it
> > in stages by starting with a simple (hah!) OoO core like an ARM9 or ARM15 and initially just replacing the
> > ROB with checkpoints. With that working and giving you, say, 20%, you could then, next generation, replace
> > the instruction window/scheduler with the CFP data buffers,
> > giving you another 20% or so, and another sellable
> > product, then finally add in the DOE weirdness to add a few more percent and speed up multi-threaded apps.
> > But even with this slicing up, each stage is a fairly ambitious engineering project...
>
> The investment in OOO is largely why I like speculative multi-threading - it can
> be used in conjunction with OOO and creates an orthogonal level of parallelism.
>
> David

SpMT is orthogonal, yes, but it seems to lead in the direction I suggest.

In the late 80s a cluster of apparently independent ideas (super scalar, out of order, branch prediction) came together in a single model. You could have used one of them without the others (e.g. Pentium used SS and BP but not OoO) but there was a synergistic effect, and they kinda fitted together.

DOE is basically SpMT, but done assuming a CFP core rather than an OoO core. As I suggested in my path to the future, you can slide these things in one at a time, but once you accept the basic concept --- on the fly I'm going to split my instruction stream into mutually independent sub streams, and run each of those as best I can --- you're led down a path where you start to ask "why am I still paying all the OoO cost when what I really want is a rather different set of underlying data structures keeping this whole mess consistent?"

But maybe you are right, in the sense that the first move we will see (the safest in the sense of building up some of the necessary infrastructure, while you can switch it off if you can't get it working in time for shipping) is just to add SpMT on top of an existing OoO core to goose it by about 15% or so. Not as much as you'd get from "doing things right" but a safe learning path.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Haskell Compilation ImprovementSymmetry2013/04/09 10:41 AM
  Haskell Compilation ImprovementEric Bron2013/04/09 11:56 AM
  Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:03 PM
    Haskell Compilation ImprovementEduardoS2013/04/09 12:20 PM
      Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:31 PM
        Haskell Compilation ImprovementEduardoS2013/04/09 12:49 PM
    Haskell Compilation Improvement2013/04/11 01:36 AM
      Haskell Compilation ImprovementEric Bron2013/04/11 03:58 AM
        Haskell Compilation ImprovementBrendan2013/04/11 07:06 AM
          Haskell Compilation ImprovementSymmetry2013/04/11 07:45 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:31 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:57 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:26 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 11:36 AM
                Haskell Compilation ImprovementBrendan2013/04/11 05:00 PM
                  Haskell Compilation ImprovementDavid Kanter2013/04/11 08:50 PM
                    Software prefetching in JVMsGabriele Svelto2013/04/12 03:31 PM
                  Haskell Compilation ImprovementEric Bron2013/04/12 09:12 AM
                    Haskell Compilation ImprovementBrendan2013/04/12 11:40 AM
                      Haskell Compilation ImprovementEric Bron2013/04/12 12:15 PM
                        Haskell Compilation ImprovementBrendan2013/04/12 03:34 PM
                          Haskell Compilation ImprovementEric Bron2013/04/12 10:44 PM
                            Haskell Compilation ImprovementBrendan2013/04/13 02:20 AM
                              Haskell Compilation ImprovementEric Bron2013/04/13 02:32 AM
                                Haskell Compilation ImprovementBrendan2013/04/13 10:18 AM
                                  Haskell Compilation ImprovementEric Bron2013/04/14 01:04 AM
                          Haskell Compilation ImprovementEric Bron2013/04/15 08:34 AM
                            Haskell Compilation ImprovementBrendan2013/04/16 03:26 PM
                              Prefetch compilation testsEric Bron2013/04/21 12:52 AM
        Haskell Compilation Improvementanon2013/04/11 07:14 AM
          Haskell Compilation ImprovementMichael S2013/04/11 07:27 AM
            Haskell Compilation Improvementanon2013/04/11 08:25 AM
              Haskell Compilation ImprovementMichael S2013/04/11 08:37 AM
                Haskell Compilation Improvementbakaneko2013/04/11 09:39 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 10:08 AM
                    Haskell Compilation Improvementbakaneko2013/04/11 10:36 AM
                    Haskell Compilation Improvementanon2013/04/11 10:54 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:10 AM
                        Haskell Compilation Improvementanon2013/04/11 11:18 AM
                          Haskell Compilation ImprovementEric Bron2013/04/11 11:27 AM
                            Haskell Compilation Improvementanon2013/04/11 12:02 PM
                              Haskell Compilation ImprovementEric Bron2013/04/11 12:09 PM
                                Haskell Compilation ImprovementEric Bron2013/04/11 12:12 PM
                                Haskell Compilation Improvementanon2013/04/11 12:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/11 12:30 PM
                                    Haskell Compilation Improvementanon2013/04/11 11:30 PM
                                      Haskell Compilation ImprovementEric Bron2013/04/12 09:25 AM
                                        Haskell Compilation Improvementanon2013/04/12 07:12 PM
                                          Haskell Compilation ImprovementEric Bron2013/04/12 10:51 PM
                                  Prefetch *hints*Konrad Schwarz2013/04/12 08:24 AM
                        Haskell Compilation ImprovementLinus Torvalds2013/04/11 12:56 PM
                          Inherent advantage of software prefetchJouni Osmala2013/04/11 09:41 PM
                            Inherent advantage of software prefetchSeni2013/04/13 03:40 AM
                            Another example: software scatter gather (NT)Megol2013/04/14 02:39 AM
                          Haskell Compilation ImprovementMaynard Handley2013/12/31 05:26 PM
                            Haskell Compilation ImprovementTREZA2013/12/31 05:44 PM
                              Haskell Compilation ImprovementMaynard Handley2013/12/31 07:49 PM
                                Haskell Compilation Improvementanon2013/12/31 10:39 PM
                                  Haskell Compilation ImprovementMaynard Handley2014/01/01 02:04 AM
                                  Haskell Compilation Improvementbakaneko2014/01/01 05:31 AM
                                Haskell Compilation ImprovementGabriele Svelto2014/01/02 07:57 AM
                                  Haskell Compilation ImprovementMichael S2014/01/02 08:37 AM
                                    Haskell Compilation ImprovementGabriele Svelto2014/01/02 10:09 AM
                                    Haskell Compilation ImprovementTREZA2014/01/02 12:43 PM
                            Haskell Compilation ImprovementMaynard Handley2013/12/31 06:07 PM
                            Future core architectures. (Was Haskell Compilation Improvement)Maynard Handley2014/01/03 12:06 AM
                              Speculative multi-threadingDavid Kanter2014/01/03 02:12 AM
                                Speculative multi-threadingMaynard Handley2014/01/03 05:01 AM
                              Future core architectures. (Was Haskell Compilation Improvement)Seni2014/01/03 01:09 PM
                              Future core architectures. (Was Haskell Compilation Improvement)Linus Torvalds2014/01/03 01:27 PM
                            Haskell Compilation ImprovementKonrad Schwarz2014/01/04 09:38 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:23 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:50 AM
            Haskell Compilation ImprovementEugene Nalimov2013/04/11 09:20 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:28 AM
                Haskell Compilation ImprovementEduardoS2013/04/11 07:30 PM
            Haskell Compilation Improvementanon2013/04/11 10:19 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 10:30 AM
                Haskell Compilation Improvementanon2013/04/11 10:50 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 11:03 AM
                    Haskell Compilation Improvementanon2013/04/11 11:16 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:24 AM
                        Haskell Compilation Improvementanon2013/04/11 12:09 PM
                          Haskell Compilation ImprovementEric Bron2013/04/11 12:43 PM
                            Haskell Compilation Improvementanon2013/04/11 11:27 PM
                              Haskell Compilation ImprovementEric Bron2013/04/12 12:15 AM
                                Haskell Compilation Improvementanon2013/04/12 07:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/12 11:01 PM
                      Haskell Compilation ImprovementLinus Torvalds2013/04/11 01:05 PM
                        Haskell Compilation Improvementanon2013/04/11 10:42 PM
                        Haskell Compilation ImprovementRobert David Graham2013/04/12 02:12 PM
        Software prefetch architecturePaul A. Clayton2013/04/11 08:54 AM
          Software prefetch architectureEric Bron2013/04/11 09:06 AM
            Software prefetch architectureMegol2013/04/15 11:03 AM
              Software prefetch architectureEric Bron2013/04/15 11:30 AM
  low barMichael S2013/04/09 04:38 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?