This is the final installment of a three-part article about the futuristic Alpha EV8, the most powerful and ambitious microprocessor yet proposed. It examines the potential impact of thread level parallelism (TLP) exploitation on the impending competition between superscalar and EPIC processors.
What Can SMT do for the EV8?
The Compaq Alpha 21464, or EV8, is a superscalar RISC processor currently under development that can nominally issue and execute up to eight instructions per clock cycle. It will likely have the ability to overlap the out-of-order execution of well over a hundred (possibly two hundred) instructions at any given instant. Despite the incredibly complex technology behind this feat the EV8 will likely only average about 2 to 3 instructions per clock cycle on most programs, or only about one third of its peak potential. This IPC (instructions per clock) shortfall isn’t unique to wide issue, high-end RISC processors either. A three issue wide CISC processor, like the Intel Pentium Pro (whose direct architectural descendants include the Pentium II and III), sustains only about 30% of its maximum native instruction execution rate potential 
The shortfall in sustained (or average) IPC from the maximum design value occurs for several reasons. IPC can be lost due to the lack of ILP (instruction level parallelism) in the program instruction stream or the inability to exploit the ILP that is present due to limitations in the processor. For example, a sequence of dependent load operations (pointer chasing) cannot be executed in parallel because of the address/data dependency between successive loads. Another important factor in IPC shortfall is imperfect branch prediction. Not only do mispredicted branches introduce multi-cycle bubbles (periods of idleness) into the pipeline, but also the results of instructions speculatively issued and executed following a mispredicted branch must be discarded. This is why a distinction between issue IPC and commit IPC is sometimes drawn.
Issue IPC includes instructions that are issued, partially executed, and then squashed due to branch misprediction or other reasons, as well as those that execute to completion. The commit IPC includes only instructions that execute to completion and update the processor state (and thereby advance program execution). In general IPC refers to commit IPC because that is indicative of performance.
If the IPC of our hypothetical EV8 on a specific program is 2.5, that means on average, 5.5 opportunities to execute an Alpha instruction are lost every clock cycle. That is equivalent to at least 10 billion potential instructions lost every second. What simultaneous multithreading (SMT) provides is a relatively straightforward and inexpensive means to recover a portion of these lost instructions and helps close the huge gap between potential and actual instruction throughput. SMT achieves this by giving the EV8 the ability to fetch instructions from one, two or three extra threads (points of execution within a program) and issue them in otherwise wasted instruction slots left by single thread program execution. Research suggests that four way SMT can increase the IPC of an eight issue wide processor like the EV8 from about 2.5 to the range of 4 to 5.5, or roughly double the single threaded throughput .
Be the first to discuss this article!