By: Mark Roulo (nothanks.delete@this.xxx.com), January 9, 2015 12:59 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on January 9, 2015 11:54 am wrote:
> coppice (coppice.delete@this.dis.org) on January 8, 2015 10:18 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on January 7, 2015 6:03 am wrote:
> > > Recently Intel's Crago and coworkers have published a set of latency techniques "to
> > > reach the energy efficiency goals of future 1000-core data-parallel processors"
> > > [1]. They evaluated and rejected a number of existent latency techniques, including
> > > Linus beloved OoO.
> > >
> > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6522327&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6522327
> > >
> > Interesting paper. Perhaps you should read it some time. On the other hand, if you have
> > read it you should be ashamed of so blatantly misrepresenting its conclusions about OoO.
>
> For those who don't have IEEE subs and are too lazy to search,
> here's a direct link (found via Neal Crago's personal site):
>
> https://6ddba3e7c48ea938c879adbfec05061acc32bb7e.googledrive.com/host/0Bz5Zlai57wAhVUFMbjJGeG1mcnc/papers/crago_hpca13.pdf
>
> Figure 7 on p. 303 shows relative energy efficiency across a range of workloads. The OoO
> flavor they evaluated does quite well, often coming second best (and typically within a
> couple/few tens of percent) to their preferred "hybrid" approach of SMT + decoupled.
In order:
(1) Thank you for posting this link! I looked for non-IEEE access and failed. Now I can read the paper!
(2) Fuck! I've read the paper. Note that OoO does fine, but the Rigel processor that they are "using" is only targeted at acceleration/throughput loads. In other words, OoO does well for loads outside of it's primary value area.
This paper, "Hybrid Latency Tolerance for Robust Energy-Efficiency on 1000-Core Data Parallel Processors" is about UIUC's Rigel processor. One can read about this research project here: http://www.webmail.gpucomputing.net/sites/default/files/papers/340/isca09-kelm.pdf.
Several key aspects to this project:
(A) It is over (as nearly as I can tell). Sanjay Patel of UIUC, for example,
has moved on to other (commercial) things.
(B) I don't think they ever actually fabbed a physical chip.
(C) They were explicitly targeting "acceleration" type workloads ... like those addressed today by GPGPU and Xeon Phi. They were *NOT* trying to build something that would do a good job running MS-Word.
(D) Because of (C), this paper doesn't even try to reduce latency ... they discuss techniques to run loads that are already not latency sensitive.
(E) And they don't address loads that have a mix of throughput friendly and throughput unfriendly bits.
Sigh.
> coppice (coppice.delete@this.dis.org) on January 8, 2015 10:18 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on January 7, 2015 6:03 am wrote:
> > > Recently Intel's Crago and coworkers have published a set of latency techniques "to
> > > reach the energy efficiency goals of future 1000-core data-parallel processors"
> > > [1]. They evaluated and rejected a number of existent latency techniques, including
> > > Linus beloved OoO.
> > >
> > > [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6522327&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6522327
> > >
> > Interesting paper. Perhaps you should read it some time. On the other hand, if you have
> > read it you should be ashamed of so blatantly misrepresenting its conclusions about OoO.
>
> For those who don't have IEEE subs and are too lazy to search,
> here's a direct link (found via Neal Crago's personal site):
>
> https://6ddba3e7c48ea938c879adbfec05061acc32bb7e.googledrive.com/host/0Bz5Zlai57wAhVUFMbjJGeG1mcnc/papers/crago_hpca13.pdf
>
> Figure 7 on p. 303 shows relative energy efficiency across a range of workloads. The OoO
> flavor they evaluated does quite well, often coming second best (and typically within a
> couple/few tens of percent) to their preferred "hybrid" approach of SMT + decoupled.
In order:
(1) Thank you for posting this link! I looked for non-IEEE access and failed. Now I can read the paper!
(2) Fuck! I've read the paper. Note that OoO does fine, but the Rigel processor that they are "using" is only targeted at acceleration/throughput loads. In other words, OoO does well for loads outside of it's primary value area.
This paper, "Hybrid Latency Tolerance for Robust Energy-Efficiency on 1000-Core Data Parallel Processors" is about UIUC's Rigel processor. One can read about this research project here: http://www.webmail.gpucomputing.net/sites/default/files/papers/340/isca09-kelm.pdf.
Several key aspects to this project:
(A) It is over (as nearly as I can tell). Sanjay Patel of UIUC, for example,
has moved on to other (commercial) things.
(B) I don't think they ever actually fabbed a physical chip.
(C) They were explicitly targeting "acceleration" type workloads ... like those addressed today by GPGPU and Xeon Phi. They were *NOT* trying to build something that would do a good job running MS-Word.
(D) Because of (C), this paper doesn't even try to reduce latency ... they discuss techniques to run loads that are already not latency sensitive.
(E) And they don't address loads that have a mix of throughput friendly and throughput unfriendly bits.
Sigh.