By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), May 12, 2013 4:06 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 12, 2013 2:22 pm wrote:
> mpx (mpx.delete@this.nomail.pl) on May 12, 2013 12:04 pm wrote:
>>
>> It's about supporting cores with different number of threads changing dynamically
>> during runtime.
>
> Bullshit.
>
> That's just stupid. Any reasonable SMT will just re-purpose most of the resources
> for the single thread if there aren't enough threads to actually do SMT.
How expensive is it in Intel's SMT implementations to go from multithreaded mode to single-threaded mode? I seem to recall that at least some of Intel's implementations used static partitioning of some resources (e.g., simply replicating the return address predictor entries might be a better tradeoff than dynamically managing allocation of entries at finer granularity), so even if an inactive context had all its rename-able registers zeroed (and a special zero register was supported) a halted thread might still reduce modestly performance of a single thread.
> If Sparc T4 doesn't do that, and instead has a static partitioning based on some system-visible mode, then
> that is not at all an indication that anybody sane should care. It is an indication of one thing, and one
> thing only: sparc hardware designers aren't very good, and can't afford the effort to do a better job.
It is not clear to me whether the single-thread mode in SPARC T4 is true single-threaded mode (which, as noted, even the Intel implementations of SMT would have some benefit from explicitly "activating") or some limited form of thread priority where all the threads remain resident but one thread is given unlimited priority. The latter (presumably like POWER5's priority mechanism) would also require software support to manage modes of operation.
In POWER7, switching from SMT4 to SMT2 or single-threaded operation would also require some adjustment time since SMT4 exploits cluster replication of the register files to support a doubling of the number of contexts. Until the other cluster's register file has been repopulated with live values, execution might be limited to one cluster. This does not make the POWER7's implementation of SMT incompetent!
> mpx (mpx.delete@this.nomail.pl) on May 12, 2013 12:04 pm wrote:
>>
>> It's about supporting cores with different number of threads changing dynamically
>> during runtime.
>
> Bullshit.
>
> That's just stupid. Any reasonable SMT will just re-purpose most of the resources
> for the single thread if there aren't enough threads to actually do SMT.
How expensive is it in Intel's SMT implementations to go from multithreaded mode to single-threaded mode? I seem to recall that at least some of Intel's implementations used static partitioning of some resources (e.g., simply replicating the return address predictor entries might be a better tradeoff than dynamically managing allocation of entries at finer granularity), so even if an inactive context had all its rename-able registers zeroed (and a special zero register was supported) a halted thread might still reduce modestly performance of a single thread.
> If Sparc T4 doesn't do that, and instead has a static partitioning based on some system-visible mode, then
> that is not at all an indication that anybody sane should care. It is an indication of one thing, and one
> thing only: sparc hardware designers aren't very good, and can't afford the effort to do a better job.
It is not clear to me whether the single-thread mode in SPARC T4 is true single-threaded mode (which, as noted, even the Intel implementations of SMT would have some benefit from explicitly "activating") or some limited form of thread priority where all the threads remain resident but one thread is given unlimited priority. The latter (presumably like POWER5's priority mechanism) would also require software support to manage modes of operation.
In POWER7, switching from SMT4 to SMT2 or single-threaded operation would also require some adjustment time since SMT4 exploits cluster replication of the register files to support a doubling of the number of contexts. Until the other cluster's register file has been repopulated with live values, execution might be limited to one cluster. This does not make the POWER7's implementation of SMT incompetent!