By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), May 20, 2021 6:36 am
Room: Moderated Discussions
Romain Dolbeau (romain.delete@this.dolbeau.org) on May 19, 2021 4:05 am wrote:
> Little Horn (sink.delete@this.example.net) on May 17, 2021 5:03 pm wrote:
> > Thoughts?
>
> Long before I reached the end of the paper (my bad, I know), the Cray MTA (formerly
> Tera) architecture came back to my mind... Massive multithreading didn't work
> then, didn't immediately see a reason why it would work today...
Cray MTA avoided caches (which also assumes word-granular memory interfaces, implying a greater command overhead and narrow memory channels (to support dense high-bandwidth DRAM using long bursts)). I have not attentively read the paper, but it does assume caches and seems to assume a thread switch latency of up to tens of cycles (compared to MTA's any thread immediately executable).
(There was also no mention of MIPS MT Application Specific Extension, which did slightly distinguish between a thread context and a virtual processing element.)
I had composed a partial response to the original post, but now I think I will try to actually read the paper (and the responses here) and compose a more considered response. From what I have read, the authors seem to lack familiarity with hardware designs (particularly the GPU description and no mention of 3D register files). I like the general idea of hardware having a larger role in thread scheduling, but it seemed (from cursory reading) that the specific proposal was not well-thought-out.
> Little Horn (sink.delete@this.example.net) on May 17, 2021 5:03 pm wrote:
> > Thoughts?
>
> Long before I reached the end of the paper (my bad, I know), the Cray MTA (formerly
> Tera) architecture came back to my mind... Massive multithreading didn't work
> then, didn't immediately see a reason why it would work today...
Cray MTA avoided caches (which also assumes word-granular memory interfaces, implying a greater command overhead and narrow memory channels (to support dense high-bandwidth DRAM using long bursts)). I have not attentively read the paper, but it does assume caches and seems to assume a thread switch latency of up to tens of cycles (compared to MTA's any thread immediately executable).
(There was also no mention of MIPS MT Application Specific Extension, which did slightly distinguish between a thread context and a virtual processing element.)
I had composed a partial response to the original post, but now I think I will try to actually read the paper (and the responses here) and compose a more considered response. From what I have read, the authors seem to lack familiarity with hardware designs (particularly the GPU description and no mention of 3D register files). I like the general idea of hardware having a larger role in thread scheduling, but it seemed (from cursory reading) that the specific proposal was not well-thought-out.