By: David Kanter (dkanter.delete@this.realworldtech.com), January 11, 2015 12:10 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on January 10, 2015 4:50 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on January 9, 2015 11:54 am wrote:
>
> > For those who don't have IEEE subs and are too lazy to search,
> > here's a direct link (found via Neal Crago's personal site):
> >
> > https://6ddba3e7c48ea938c879adbfec05061acc32bb7e.googledrive.com/host/0Bz5Zlai57wAhVUFMbjJGeG1mcnc/papers/crago_hpca13.pdf
> >
> > Figure 7 on p. 303 shows relative energy efficiency across a range of workloads. The OoO
> > flavor they evaluated does quite well, often coming second best (and typically within a
> > couple/few tens of percent) to their preferred "hybrid" approach of SMT + decoupled.
> >
> > These aren't the sort of improvements that will drive a platform shift - If anything this should
> > be interpreted as a very negative result from the standpoint of folks like juanrga, as it fails
> > to show a compelling energy-efficiency benefit from the evaluated alternatives to OoO.
>
> You must be reading a different paper, because the authors claim in the page 303:
>
>
>
> Contrary to your claims, the Figure 7a shows that OoO comes often third or fourth best technique on each benchmark
> and is only the third better on average. On MRI the hybrid technique is about 250% better than OoO.
>
> Moreover the microbenchmarks show that even on intensive compute tasks OoO is the fourth better
> technique only outperforming runahead and plain in-order. Specially glaring is the Data-dependent
> Control Flow case, where the hybrid technique increases energy efficiency by 60% over OoO.
>
> As a consequence, the authors rejected OoO and choose instead "a hybrid technique utilizing
> multithreading and decoupled execution to maximize performance while minimizing hardware
> complexity and energy consumption across a wide variety of workloads."
>
> > Perhaps it's time to resurrect the talk of banning juanrga?
>
> And which will be excuse? That I am citing articles with
> data and conclusions that you and others here dislike?
You realize that paper is only looking at algorithms that are explicitly parallel, or nearly so?
This simply won't work for most workloads (e.g., web browsers, word, compiling, etc.).
David
> Patrick Chase (patrickjchase.delete@this.gmail.com) on January 9, 2015 11:54 am wrote:
>
> > For those who don't have IEEE subs and are too lazy to search,
> > here's a direct link (found via Neal Crago's personal site):
> >
> > https://6ddba3e7c48ea938c879adbfec05061acc32bb7e.googledrive.com/host/0Bz5Zlai57wAhVUFMbjJGeG1mcnc/papers/crago_hpca13.pdf
> >
> > Figure 7 on p. 303 shows relative energy efficiency across a range of workloads. The OoO
> > flavor they evaluated does quite well, often coming second best (and typically within a
> > couple/few tens of percent) to their preferred "hybrid" approach of SMT + decoupled.
> >
> > These aren't the sort of improvements that will drive a platform shift - If anything this should
> > be interpreted as a very negative result from the standpoint of folks like juanrga, as it fails
> > to show a compelling energy-efficiency benefit from the evaluated alternatives to OoO.
>
> You must be reading a different paper, because the authors claim in the page 303:
>
>
Our hybrid technique enables a larger degree of energy efficiency. By detecting data-dependent control flow
> in the compiler, blackscholes and cutcp operate in multi-threading mode, while other benchmarks are in decoupled
> mode. Only two strands are extracted in dmm, kmeans, and mri enabling two-way multithreading. On average the
> energy-efficiency improvement of the hybrid latency tolerance technique is 28% to 89% over hardware prefetching,
> out-of-order, multithreading, hardware scout prefetching, and decoupled techniques alone.
>
> Contrary to your claims, the Figure 7a shows that OoO comes often third or fourth best technique on each benchmark
> and is only the third better on average. On MRI the hybrid technique is about 250% better than OoO.
>
> Moreover the microbenchmarks show that even on intensive compute tasks OoO is the fourth better
> technique only outperforming runahead and plain in-order. Specially glaring is the Data-dependent
> Control Flow case, where the hybrid technique increases energy efficiency by 60% over OoO.
>
> As a consequence, the authors rejected OoO and choose instead "a hybrid technique utilizing
> multithreading and decoupled execution to maximize performance while minimizing hardware
> complexity and energy consumption across a wide variety of workloads."
>
> > Perhaps it's time to resurrect the talk of banning juanrga?
>
> And which will be excuse? That I am citing articles with
> data and conclusions that you and others here dislike?
You realize that paper is only looking at algorithms that are explicitly parallel, or nearly so?
This simply won't work for most workloads (e.g., web browsers, word, compiling, etc.).
David