POWER5 – Major Improvements From Top to Bottom
Prior to its release, the aspect of the POWER5 that IBM directed most attention to was the addition of SMT to a POWER4 style microarchitecture. This technique exploits the natural inefficiencies and stalls that occur when programs are executed on a fast and wide processor to allow the CPU to opportunistically interleave the execution of two threads. Although this does not speed up an individual program, it allows a single CPU to get more work done from a number of threads in the same amount of time. In addition to SMT, the POWER5 greatly improved system level functional partitioning compared to POWER4 as shown below in Figure 1.
Figure 1 – System Partitioning: POWER4 and POWER5
To support SMT, the POWER5’s architects increased the number GPR physical rename registers from 80 to 120 and the number of FPR rename registers from 72 to 120 . The additional FP rename capability had the benefit of increasing POWER5’s single thread performance on some HPC workloads compared to the original POWER4 microarchitecture; this gain stemmed from an enhanced ability to execute critical code sections out of order.
Despite the significant internal and external architectural changes that separate the POWER4 and POWER5, their device floorplans are remarkably similar. The original 180nm POWER4 device was 412mm2. It was later shrunk to 130nm and became known as the POWER4+. Because the POWER4+’s L2 capacity was barely increased (~500kb, from 1.44MB) its die size fell to 267mm2. The POWER5 is manufactured in the same 130nm SOI CMOS process as the POWER4+ but enhancements to the CPU (including SMT), modestly larger L2 capacity, and increased integration of system level functions including memory controller, swell the POWER5’s die size to 389mm2. Next year IBM will likely release the POWER5+, a shrink of the existing POWER5 device to the 90nm process used to currently manufacture the PowerPC 970FX. The relative die size of the POWER4+, POWER5, and an estimate for the POWER5+ presuming no major changes to L2 capacity are shown in Figure 2.
Figure 2 – Relative size of POWER4+, POWER5, and estimated POWER5+
Although eight processor/quad device MCMs similar to those used for POWER4 and POWER4+ are planned for POWER5, current systems use a dual chip module (DCM) that integrates a single POWER5 device along with its associated 36 MB L3 EDRAM ASIC. Although this packaging configuration is quite reminiscent of the Pentium Pro, the POWER5 DCM lacks a conventional system bus and instead has separate wide interfaces for connection to memory, other DCMs, and I/O devices. Despite a 46% increase in die area and an increase in complexity from 184m transistors to 276m transistors, IBM engineers were able to increase POWER5’s clock frequency over the POWER4+. The POWER5 was introduced at 1.65 GHz and 1.9 GHz speed grades. Until quite recently, the POWER4+ topped out at 1.7 GHz. IBM may even be able to coax even slightly higher clock frequencies from the 130nm POWER5 before the emphasis shifts to 90nm.
Discuss (39 comments)