Diminishing Returns: Brainiacs Need Not Apply
At the highest level of abstraction, the basic design choice for a microprocessor architect is the balance between maximizing clock rate versus maximizing the average number of instructions executed per clock cycle (IPC). The target design space can be visualized as a spectrum ranging from a focus on very high clock rate, even at the expense of IPC, sometimes called the ‘speedracer’ approach, versus the opposite approach that tries to maximize IPC even at the expense of clock frequency, the ‘brainiac’ approach.
In the past, neither approach had a clear cut advantage, but brainiac processor designs have tended to fall short of their performance targets more often than speedracers. This is partially due to the fact that the immense combinational logic complexity involved in processing multiple instructions in parallel often impacts maximum clock frequency more than the designers expect, and partially due to the inability of compilers to extract the expected amount of usable instruction level parallelism (ILP) from programs to keep all the functional units active. In addition, there is a very powerful effect of diminishing returns from throwing extra logic gates at parallel instruction execution. This effect is shown in Figure 1.
Figure 1 Frequency, IPC, and Complexity Trade-offs
Clock frequency is highest at the extreme speedracer end of the spectrum and decreases, gently at first and then much faster, towards the extreme brainiac end of the spectrum. The rapidly accelerating frequency fall-off in extreme brainiacs is due to the geometric increase in the complexity of control and data routing circuitry needed to support ever wider instruction issue width and the greater distances over a die swollen with an increasing number of functional units that signals must travel. The complexity effect is also seen in the rapid rise in transistor count at the brainiac extreme. The transistor count also rises for extreme speedracers but not quite as dramatically. This is due to the increased number of pipeline stages, more complex bypassing, and increased usage of signal repeaters on long wires. IPC is highest at the brainiac end of the spectrum, but the rate of increase is gentle at best and quickly saturates.
The idealized effect of CPU design style on power consumption of deep sub-micron CMOS process implementations is shown in Figure 2.
Figure 2 Power Trade-offs
Leakage current power is almost a direct function of transistor count and is usually a minor component of overall power. The switching power is a function of clock frequency and the number of transistors being switched each cycle. The total power is the sum of switching and leakage power and grows in magnitude when moving from a balanced CPU design to an extreme speedracer or brainiac approach. Performance and computational energy efficiency (performance divided by power) versus the design approach is shown in Figure 3.
Figure 3 Performance and Energy Efficiency Trade-offs
The performance curve is generally flat over most of the spectrum and only tends to fall off significantly in the extreme brainiac end of the spectrum. When you look at computational energy efficiency the penalty of moving away from a balanced design is even more pronounced.
Discuss (78 comments)