Conclusion and Summary
The quarter century old practice of designing bigger, better, and wider CPUs for attaining MPU performance beyond process driven clock rate increases is about to be radically curtailed by the twin barriers of power budget limitations and the scaling mismatch between interconnect and transistors. The latter effect is so severe that beyond 90nm the physical size of individual CPUs will drop start falling dramatically even compared to traditional die sizes for high volume desktop PC oriented MPUs. The decrease in CPU size will be used to increase the integrated functions and capabilities and to integrate an increasing number of CPU cores per device. The trend will be especially noticeable in server chips but will also spread to MPUs for desktop computers and eventually even mobile and embedded control applications. Research into highly CMP server chips in the past has been driven largely by a desire to trade off single thread performance per CPU for higher total device level throughput . In the future such an approach may be driven by necessity from the increasing cost of cross die communication rather than by architectural choice. The good news is that the reduction in single thread performance compared to what could otherwise be accomplished will disappear over time.
The need to shrink overall CPU size faster than natural transistor density growth provides means that the complexity of individual CPU cores will need to fall. In the absence of significant advances in implementation efficiency this will accelerate the downward pressure on architectural (“IPC”) performance that already exists from the growing mismatch between CPU clock rate growth and main memory access time. Nevertheless the single thread performance of each CPU will continue to rise, driven by clock rate scaling with reduced feature size not that dissimilar from historical levels. The trend towards reduced CPU complexity will also help temper the growth of leakage current power consumption. The need to reduce CPU complexity over time plus the ability to integrate an increasing number of cores on a device to exploit TLP probably means that usage of CPU multi-threading techniques like SMT will diminish over time rather than grow. Although multi-threading techniques tend to contribute only slightly to CPU physical complexity it tends to consume a disproportionate amount of processor design and verification time. With rapidly shrinking CPU size it will become far easier to exploit TLP through increasing degree of CMP. The long term trend to CPU simplification may eventually bring renewed interest in non-x86 ISAs that can provide high levels of ILP exploitation with relatively low logic transistor count for computing markets now dominated by x86 processors.
Discuss (11 comments)