By: dmcq (dmcq.delete@this.fano.co.uk), May 20, 2021 10:09 am
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on May 20, 2021 6:36 am wrote:
> Romain Dolbeau (romain.delete@this.dolbeau.org) on May 19, 2021 4:05 am wrote:
> > Little Horn (sink.delete@this.example.net) on May 17, 2021 5:03 pm wrote:
> > > Thoughts?
> >
> > Long before I reached the end of the paper (my bad, I know), the Cray MTA (formerly
> > Tera) architecture came back to my mind... Massive multithreading didn't work
> > then, didn't immediately see a reason why it would work today...
>
> Cray MTA avoided caches (which also assumes word-granular memory interfaces, implying a greater command
> overhead and narrow memory channels (to support dense high-bandwidth DRAM using long bursts)). I
> have not attentively read the paper, but it does assume caches and seems to assume a thread switch
> latency of up to tens of cycles (compared to MTA's any thread immediately executable).
>
> (There was also no mention of MIPS MT Application Specific Extension, which did slightly
> distinguish between a thread context and a virtual processing element.)
>
> I had composed a partial response to the original post, but now I think I will try to actually read
> the paper (and the responses here) and compose a more considered response. From what I have read, the
> authors seem to lack familiarity with hardware designs (particularly the GPU description and no mention
> of 3D register files). I like the general idea of hardware having a larger role in thread scheduling,
> but it seemed (from cursory reading) that the specific proposal was not well-thought-out.
I don't know about the Cray design but it seems to me from what's described that it was based on the principle of a GPU, slower but much wider allowing lots of data to get around to where it is needed. A good choice for large computational problems.
> Romain Dolbeau (romain.delete@this.dolbeau.org) on May 19, 2021 4:05 am wrote:
> > Little Horn (sink.delete@this.example.net) on May 17, 2021 5:03 pm wrote:
> > > Thoughts?
> >
> > Long before I reached the end of the paper (my bad, I know), the Cray MTA (formerly
> > Tera) architecture came back to my mind... Massive multithreading didn't work
> > then, didn't immediately see a reason why it would work today...
>
> Cray MTA avoided caches (which also assumes word-granular memory interfaces, implying a greater command
> overhead and narrow memory channels (to support dense high-bandwidth DRAM using long bursts)). I
> have not attentively read the paper, but it does assume caches and seems to assume a thread switch
> latency of up to tens of cycles (compared to MTA's any thread immediately executable).
>
> (There was also no mention of MIPS MT Application Specific Extension, which did slightly
> distinguish between a thread context and a virtual processing element.)
>
> I had composed a partial response to the original post, but now I think I will try to actually read
> the paper (and the responses here) and compose a more considered response. From what I have read, the
> authors seem to lack familiarity with hardware designs (particularly the GPU description and no mention
> of 3D register files). I like the general idea of hardware having a larger role in thread scheduling,
> but it seemed (from cursory reading) that the specific proposal was not well-thought-out.
I don't know about the Cray design but it seems to me from what's described that it was based on the principle of a GPU, slower but much wider allowing lots of data to get around to where it is needed. A good choice for large computational problems.