Interconnect Problems – Think Globally
While transistor switching performance has continued to improve by roughly a third each fabrication process generation, the wires that connect them throughout the chip, the metal interconnects have comparatively deteriorated in performance . Interconnect can also draw up to a third of the power utilization of a modern microprocessor. Indeed, the energy required to drive an operand across a chip’s wires can dwarf the energy needed to operate on it by the computation logic . There have been isolated one-time improvements to interconnects, such as slightly better insulating materials between interconnect layers  to reduce parasitic capacitance. The switch from aluminum to copper interconnects reduced resistance, which similarly increases the performance of interconnects. However, the future of wire performance is clear and getting worse with each process generation.
Latency of on-chip wires is generally a product of their resistance and capacitance, the RC delay, which is near a factor of the speed of light. A wire’s RC propagation delay is quadratic in proportion to its length (i.e. a wire that is twice is long might have an RC delay 4x larger, or more). As process feature sizes shrink, the capacitance of shrunken wires decreases marginally. However, the cross-section of the wire is cut in half, which doubles the resistance, effectively doubling the propagation delay. Functional unit blocks will also shrink, which reduces the length of the local intra-block interconnect (i.e. wires between different stages of a multiplier). This tends to mitigate latency increases, at least for the local wires. Yet constant wire latency in the presence of enhanced transistor switching speeds effectively increases interconnect latency, in relative terms. This comparative increase is tolerable for intra-block wires as they are very short, contributing little latency to the overall cycle time of the chip. It is the inter-block and especially upper level global on-chip interconnect where the majority of the increasing delay is encountered. Assuming a constant microprocessor die size between process generations, the latency of the global wires that have to travel the length of the chip could double with each shrink. This would triple the relative difference between global wire and transistor performance every process generation, reducing the chip area a global signal can travel per narrowing clock cycle, as illustrated in Figure 1.
Figure 1 – Range of a wire in a single clock cycle
A time honored solution to alleviate this problem is inserting buffers and flip-flops to partition a long wire into segments, boosting the signal. Since wire delay is a quadratic function of the length of a wire, segmenting a wire into two equal sub-segments halves the total wire latency, although the buffer itself introduces a small delay. However, buffers and flip-flops are not free, as they consume additional power. The number of buffers needed to ameliorate the interconnect-transistor disparity would grow exponentially over different process generations, making it unsuitable as a long term solution.
Figure 2 – 65nm generation diagonal interconnect test chip (Source: Applied Materials)
A proposed improvement  from geometry for intermediate and global wires is to run wires across the length of the chip diagonally (Figure 2) to directly connect logic blocks. Traditional designs use Manhattan routing, with alternating horizontal and vertical wires for each layer of metal in the chip. Diagonal routing shortens such wires up to thirty percent and will have the effect of cutting the interconnect latency along these wires in half. However volume manufacturing and yield concerns have limited the appeal of this one time design improvement. The Electronic Design Automation (EDA) tools needed to support it have not matured to a level where the major semiconductor companies are comfortable utilizing it.
Discuss (7 comments)