Built For Speed
The use of deep pipelining adds much complexity to a processor in the form of extra register stages, a more extensive and heavily loaded clock distribution network, longer absolute instruction execution latency, and higher power consumption. In addition, much effort must be expended in refining every element of the microarchitecture to keep the resultant IPC loss from worsened branch misprediction penalty and inter-instruction dependencies to a minimum.
The payoff for all this effort is a higher maximum clock rate in any given technology. According to IBM architect Charles Moore, the POWER4 uses a maximum of 10 to 12 levels of logic per pipe stage and achieves a global clock distribution uncertainty of 22 ps. This compares to 16 levels in the Alpha EV4x, 14 levels in the Alpha EV5x, and 12 in the Alpha EV6x . Considering how much more deeply pipelined the POWER4 is, this seems to imply that either the PowerPC ISA is significantly more complicated and less streamlined and amenable to logical implementation than the Alpha ISA, that there is complexity penalty to the instruction group tracking technique, or it is a stark testament to the talents of the EV6 design team. In all likelihood, the similar number of logic levels in both processors is due to a combination of all three factors.
According to IBM, the POWER4 core can run in excess of 1 GHz, a rather low bar to set for such an aggressive design built in their ultra high performance 8S2SOI semiconductor process. So how fast could the POWER4 conceivably be clocked? That is a very complicated question because many different parts of a design can limit the clock rate. But a general ballpark figure might be estimated from what is known for a similarly aggressive superscalar RISC MPU. Apparently the Alpha EV68 core will run at least 1.25 GHz in a 0.18 um bulk CMOS process with copper interconnect, although this figure is likely conservative . Moving a design from a bulk CMOS to an SOI CMOS process can increase the clock rate 20 to 25% from shorter logic delays resulting from reduced junction capacitance . This implies the POWER4 could achieve 1.5 GHz.
Other factors suggest it might even be faster. For example, when setting their timing budget, CPU designers tend to set aside a similar fraction of the target clock period for clock distribution skew. This is because if the skew target is set too high, then it artificially limits the clock rate and makes it harder to avoid race conditions, while if it is set too low, the chip may experience extensive delays in final physical design and miss the scheduled tape-out date (Merced?). The EV6 targeted a clock rate of 600 MHz and its designers limited clock skew to 72 ps . That value is more than three times larger than the 22 ps achieved in the POWER4. Thus the quality of clock distribution in the POWER4 seems consistent with a clock rate target in excess of 1.8 GHz.
Be the first to discuss this article!