Simultaneous Multithreading technology seems to be a match complement to the modern out-of-order execution superscalar RISC processor. The difficult task of tracking computational results for instructions from separate threads issuing and executing simultaneously is a natural fit with register renaming schemes currently used to work around false register based data dependencies between instructions and support recovery from speculated instruction execution. The problem of selecting instructions from a group of active hardware threads for SMT issue and execution has a relatively simple heuristic solution that provides robust performance over a wide range of workloads with varying degrees of ILP and TLP.
Research to date suggests SMT can approximately double the throughput performance of an 8 instruction-issue wide processor like EV8 for a cost in extra processor complexity equivalent to less than 10% increased die area for the processor core. The multithreading capabilities of an SMT processor can be accessed by software through a virtual CMP model that uses abstracted TPUs in place of multiple physical CPUs. Existing thread synchronization mechanisms can be retained with little impact on SMT processor performance if appropriate measures are taken to ensure threads waiting for a semaphore do not consume a share of execution resources.
In the third and final part of this article I will examine how the performance characteristics of SMT potentially impact EV8’s competitive posture relative to alternative design approaches like EPIC and CMP and the implications for the future of MPU design.
 Tullsen, D. et al, ‘Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor’, Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
Be the first to discuss this article!