Smaller, Simpler, Faster
Having formulated a simple model to demonstrate the concept of the optimal size or complexity of CPU to maximize single thread performance, it is interesting to ask the question of what happens when the design is shrunk. As previously mentioned optimal performance is achieved when CPU size and complexity is such that the global wire delay portion of the critical path is equal to the sum of the latch overhead and transistor delay portion of the critical path:
K1Comp = TCLK + TTRAN
Consider the effect of a process shrink by a linear factor of S (S = 0.7 for a typical single generation process shrink) while CPU complexity is kept constant. The latch overhead and transistor portion of critical path delay will both fall by approximately a factor of S while global wire delay constant of proportionality more or less stays the same. This implies that the complexity of the CPU needs to be reduced by a factor of S to maintain the equality relationship shown above that defines the optimal performance processor. The implications of this effect are shown in Table 1.
| ||Arbitrary||Typical Shrink|
|Feature Size Shrink Factor||S||0.7|
|Performance (IPC * Clock Frequency)||S-1/2||1.20|
|Dynamic Power (voltage scaled by S)||S2||0.49|
|Dyn. Power Density (voltage scaled by S)||S-1||1.43|
Table 1 – Effect of Process Shrink on Optimal Performance CPU Design
The optimal performance CPU must shed about 30% of its transistor count for a typical 30% linear process shrink to reduce its size enough to make up for higher RC delay constant of the interconnect. But because transistor density doubles as a result of the process shrink the overall effect translates into an optimal performance CPU design of barely more than 1/3 of the die area of that in the previous generation process. Hence was derived the title of this article!
Notice that as a result of the shrink, the optimal performance CPU’s clock frequency rises faster than its IPC falls. Thus there is still a net increase in single thread performance benefit from process shrinks even in a heavily interconnect dominated MPU design regime. Dynamic power scales approximately as the product of CPU complexity, clock frequency and the square of supply voltage. Thus the optimal performance CPU design complexity reduction cancels out clock frequency increase so power goes as the square of the supply voltage. Supply voltage is ideally scaled proportionally with process linear dimension (i.e. constant field scaling) so dynamic power will fall by about half for a typical process shrink. The power density of the optimal performance CPU design increases as the inverse of the process scaling factor or by about 1.43 for a typical process shrink.
Unfortunately, the change in the leakage component of CPU power consumption due to a process shrink has too many complex and technology specific factors for general analysis. One should keep in mind that leakage is not one single effect but rather the sum of more than a half dozen different leakage mechanisms even in a basic bulk CMOS process. The various components respond differently to changes in supply voltage and physical scaling of transistors and associate source and drain active areas and are very sensitive to changes in materials and manufacturing processes. Also leakage is strongly dependent on circuit implementation details which tend to change over time. About the most that can be said is that reduction in CPU complexity, die area, and supply voltage tends to reduce leakage current and power by varying degrees while the shrinking of device and interconnect critical dimensions tends to increase leakage strongly.
Discuss (11 comments)