Haskell Compilation Improvement

By: Maynard Handley (name99.delete@this.redheron9.com), January 1, 2014 2:04 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on December 31, 2013 10:39 pm wrote:
> Maynard Handley (name99.delete@this.redheron9.com) on December 31, 2013 7:49 pm wrote:
> > TREZA (no.delete@this.ema.il) on December 31, 2013 5:44 pm wrote:
> > > Maynard Handley (name99.delete@this.redheron9.com) on December 31, 2013 5:26 pm wrote:
> > > > Meanwhile as a dark horse in the prefetching world, there is Yale Patt's favorite idea of abandoning
> > > > the attempt at more and more aggressive OoO and switching to runahead computing, as in, e.g.
> > > > http://users.ece.cmu.edu/~omutlu/pub/mutlu_ieee_micro06_submitted.pdf
> > > > I personally would not be surprised to see this idea popup implemented in ARM. Basically
> > > > it's (IMHO) a way to do the same sort of amount of work as implementing SMT, but in a
> > > > way that actually achieves something useful for most code. (While, for ARM the CPUs are
> > > > so small that if you want more SMP, you might as well just replicate the cores.)
> > >
> > > Is it like the late Sun ROCK CPU ? Scout threads, speculative instruction retiring...
> > >
> > > It was abandoned as an overheating, slow chip.
> >
> > It does seem like the same idea. But I don't think we know
> > enough about Rock to conclude that it's a bad idea.
> > We know that Rock was frequently delayed, which may just
> > be a sign of poor management, or that it was generally
> > too ambitious. And that ambition may have failed not in the
> > runahead computing but in the HWTM, or in the attempting
> > to share resources (I & D cache, FPU) between cores, or in trying to fit too much onto the chip.
> >
> > We know that the chip was abandoned shortly after Oracle bought Sun, and we know that Ellison claimed
> > that it was abandoned because ran hot and slow. But we don't,
> > as far as I know, have independent confirmation
> > of this claim (after all, Ellison had his own agenda and vision for Sun once it became part of Oracle)
> > or any insight into the extent to which what Ellison was seeing as slow and hot was in fact not the
> > final chip but, e.g., some sort of FPGA version being used to verify the design.
> POWER6 did it.
> A paper "Runahead execution vs. conventional data prefetching
> in the IBM POWER6 microprocessor" says in the abstract:
> "After many years of prefetching research, most commercially available systems support only two types of prefetching:
> software-directed prefetching and hardware-based prefetchers using simple sequential or stride-based prefetching
> algorithms. More sophisticated prefetching proposals, despite promises of improved performance, have not been
> adopted by industry. In this paper, we explore the efficacy of both hardware and software prefetching in the
> context of an IBM POWER6 commercial server. Using a variety of applications that have been compiled with an
> aggressively optimizing compiler to use software prefetching when appropriate, we perform the first study of
> a new runahead prefetching feature adopted by the POWER6 design, evaluating it in isolation and in conjunction
> with a conventional hardware-based sequential stream prefetcher and compiler-inserted software prefetching.
> We find that the POWER6 implementation of runahead prefetching is quite effective on many of the memory intensive
> applications studied; in isolation it improves performance as much as 36% and on average 10%. However, it outperforms
> the hardware-based stream prefetcher on only two of the benchmarks studied, and in those by a small margin.
> When used in conjunction with the conventional prefetching mechanisms, the runahead feature adds an additional
> 6% on average, and 39% in the best case (GemsFDTD)."
> Whether it is easier than implementing deep OOOE or not is questionable, but IBM also
> seems to have abandoned the approach, with POWER8 looking like a POWER7 derivative.
> POWER6 was a very fast core when released, but it must have used quite a lot of power and/or area,
> because devices were only dual core, which was disappointing for throughput oriented servers.
> POWER6 was boasted as having double the clock speed of their OOOE POWER5 core, however
> between POWER5 and POWER7, they seem to have made significant improvements in scaling
> frequency, and it's vastly ahead of POWER6 in energy efficiency of course.

This is an interesting paper. One could quibble, however, that it's not a fair comparison.
(This is a constant problem with new architectural ideas that get implemented "on the cheap". SMT on the P4 is a similar example.)
The most obvious flaw I see is that they run their runahead instructions purely out of the 64-entry I-fetch buffer. This gives them, at best, the equivalent of 16 to 32 cycles of latency hiding. Adequate for covering a miss to L2, or a fast L3, but not for main memory. Yale Patt's simulations show the value of runahead increasing with longer latency memory, all the way out to 1000 cycles.
(On a different point, I am guessing they did not implement any of Yale Patt's suggestions for how to reduce the energy cost, apart from squelching FP ops.)

It can be found at
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Haskell Compilation ImprovementSymmetry2013/04/09 10:41 AM
  Haskell Compilation ImprovementEric Bron2013/04/09 11:56 AM
  Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:03 PM
    Haskell Compilation ImprovementEduardoS2013/04/09 12:20 PM
      Haskell Compilation ImprovementLinus Torvalds2013/04/09 12:31 PM
        Haskell Compilation ImprovementEduardoS2013/04/09 12:49 PM
    Haskell Compilation Improvement2013/04/11 01:36 AM
      Haskell Compilation ImprovementEric Bron2013/04/11 03:58 AM
        Haskell Compilation ImprovementBrendan2013/04/11 07:06 AM
          Haskell Compilation ImprovementSymmetry2013/04/11 07:45 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:31 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:57 AM
            Haskell Compilation ImprovementBrendan2013/04/11 11:26 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 11:36 AM
                Haskell Compilation ImprovementBrendan2013/04/11 05:00 PM
                  Haskell Compilation ImprovementDavid Kanter2013/04/11 08:50 PM
                    Software prefetching in JVMsGabriele Svelto2013/04/12 03:31 PM
                  Haskell Compilation ImprovementEric Bron2013/04/12 09:12 AM
                    Haskell Compilation ImprovementBrendan2013/04/12 11:40 AM
                      Haskell Compilation ImprovementEric Bron2013/04/12 12:15 PM
                        Haskell Compilation ImprovementBrendan2013/04/12 03:34 PM
                          Haskell Compilation ImprovementEric Bron2013/04/12 10:44 PM
                            Haskell Compilation ImprovementBrendan2013/04/13 02:20 AM
                              Haskell Compilation ImprovementEric Bron2013/04/13 02:32 AM
                                Haskell Compilation ImprovementBrendan2013/04/13 10:18 AM
                                  Haskell Compilation ImprovementEric Bron2013/04/14 01:04 AM
                          Haskell Compilation ImprovementEric Bron2013/04/15 08:34 AM
                            Haskell Compilation ImprovementBrendan2013/04/16 03:26 PM
                              Prefetch compilation testsEric Bron2013/04/21 12:52 AM
        Haskell Compilation Improvementanon2013/04/11 07:14 AM
          Haskell Compilation ImprovementMichael S2013/04/11 07:27 AM
            Haskell Compilation Improvementanon2013/04/11 08:25 AM
              Haskell Compilation ImprovementMichael S2013/04/11 08:37 AM
                Haskell Compilation Improvementbakaneko2013/04/11 09:39 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 10:08 AM
                    Haskell Compilation Improvementbakaneko2013/04/11 10:36 AM
                    Haskell Compilation Improvementanon2013/04/11 10:54 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:10 AM
                        Haskell Compilation Improvementanon2013/04/11 11:18 AM
                          Haskell Compilation ImprovementEric Bron2013/04/11 11:27 AM
                            Haskell Compilation Improvementanon2013/04/11 12:02 PM
                              Haskell Compilation ImprovementEric Bron2013/04/11 12:09 PM
                                Haskell Compilation ImprovementEric Bron2013/04/11 12:12 PM
                                Haskell Compilation Improvementanon2013/04/11 12:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/11 12:30 PM
                                    Haskell Compilation Improvementanon2013/04/11 11:30 PM
                                      Haskell Compilation ImprovementEric Bron2013/04/12 09:25 AM
                                        Haskell Compilation Improvementanon2013/04/12 07:12 PM
                                          Haskell Compilation ImprovementEric Bron2013/04/12 10:51 PM
                                  Prefetch *hints*Konrad Schwarz2013/04/12 08:24 AM
                        Haskell Compilation ImprovementLinus Torvalds2013/04/11 12:56 PM
                          Inherent advantage of software prefetchJouni Osmala2013/04/11 09:41 PM
                            Inherent advantage of software prefetchSeni2013/04/13 03:40 AM
                            Another example: software scatter gather (NT)Megol2013/04/14 02:39 AM
                          Haskell Compilation ImprovementMaynard Handley2013/12/31 05:26 PM
                            Haskell Compilation ImprovementTREZA2013/12/31 05:44 PM
                              Haskell Compilation ImprovementMaynard Handley2013/12/31 07:49 PM
                                Haskell Compilation Improvementanon2013/12/31 10:39 PM
                                  Haskell Compilation ImprovementMaynard Handley2014/01/01 02:04 AM
                                  Haskell Compilation Improvementbakaneko2014/01/01 05:31 AM
                                Haskell Compilation ImprovementGabriele Svelto2014/01/02 07:57 AM
                                  Haskell Compilation ImprovementMichael S2014/01/02 08:37 AM
                                    Haskell Compilation ImprovementGabriele Svelto2014/01/02 10:09 AM
                                    Haskell Compilation ImprovementTREZA2014/01/02 12:43 PM
                            Haskell Compilation ImprovementMaynard Handley2013/12/31 06:07 PM
                            Future core architectures. (Was Haskell Compilation Improvement)Maynard Handley2014/01/03 12:06 AM
                              Speculative multi-threadingDavid Kanter2014/01/03 02:12 AM
                                Speculative multi-threadingMaynard Handley2014/01/03 05:01 AM
                              Future core architectures. (Was Haskell Compilation Improvement)Seni2014/01/03 01:09 PM
                              Future core architectures. (Was Haskell Compilation Improvement)Linus Torvalds2014/01/03 01:27 PM
                            Haskell Compilation ImprovementKonrad Schwarz2014/01/04 09:38 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:23 AM
          Haskell Compilation ImprovementEric Bron2013/04/11 08:50 AM
            Haskell Compilation ImprovementEugene Nalimov2013/04/11 09:20 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 09:28 AM
                Haskell Compilation ImprovementEduardoS2013/04/11 07:30 PM
            Haskell Compilation Improvementanon2013/04/11 10:19 AM
              Haskell Compilation ImprovementEric Bron2013/04/11 10:30 AM
                Haskell Compilation Improvementanon2013/04/11 10:50 AM
                  Haskell Compilation ImprovementEric Bron2013/04/11 11:03 AM
                    Haskell Compilation Improvementanon2013/04/11 11:16 AM
                      Haskell Compilation ImprovementEric Bron2013/04/11 11:24 AM
                        Haskell Compilation Improvementanon2013/04/11 12:09 PM
                          Haskell Compilation ImprovementEric Bron2013/04/11 12:43 PM
                            Haskell Compilation Improvementanon2013/04/11 11:27 PM
                              Haskell Compilation ImprovementEric Bron2013/04/12 12:15 AM
                                Haskell Compilation Improvementanon2013/04/12 07:14 PM
                                  Haskell Compilation ImprovementEric Bron2013/04/12 11:01 PM
                      Haskell Compilation ImprovementLinus Torvalds2013/04/11 01:05 PM
                        Haskell Compilation Improvementanon2013/04/11 10:42 PM
                        Haskell Compilation ImprovementRobert David Graham2013/04/12 02:12 PM
        Software prefetch architecturePaul A. Clayton2013/04/11 08:54 AM
          Software prefetch architectureEric Bron2013/04/11 09:06 AM
            Software prefetch architectureMegol2013/04/15 11:03 AM
              Software prefetch architectureEric Bron2013/04/15 11:30 AM
  low barMichael S2013/04/09 04:38 PM
Reply to this Topic
Body: No Text
How do you spell avocado?