Reliability, Availability, Serviceability
One of the other major points emphasized in Dr. McCredie’s presentation was the emphasis on reliability, availability and serviceability. For IBM, this is an essential component. IBM currrently ships the POWER5 in some of their storage products, and it is expected that the POWER6 will be used similarly*. Of course, the POWER6 will also be used as the CPU for their midrange System i nee AS/400.
The POWER6 uses a check point system to preserve correctness and gracefully correct and tolerate failures. This entails a recovery unit, an error logger and a restart mechanism. The processor state is stored in the recovery unit and protected with ECC. Any action that can cause a state change, such as a register or cache write, is inspected for parity and/or ECC failures. In the case of correctable errors (or no errors), the changes can be committed into the processor state normally. A non-correctable error, such as an array parity or control failure, triggers the logger to record the type of error and then restart execution from the known good state. At this point, any transient errors should resolve correctly. An error which recurs is then escalated and the known good state is transferred to another CPU, which then starts execution. This will transparently catch any hard errors that are isolated to a single processor, but further problems will likely require software intervention. On top of these MPU oriented improvements, there are also the previously mentioned system changes that will improve RAS.
On the management side, IBM is also improving their virtualization capabilities in the POWER6. In particular products, a single processor may be able to host 2-300 virtual instances, although theoretically up to 1024 VMs are possible. Memory partitioning and migration have been added as well, which reduces system down time for repairs. IBM will also enable Power Executive management tools for new systems based on the POWER6. Power Executive was recently demonstrated at Intel’s Developer Forum and is a management tool for dealing with measuring power, system health and using this information to make policy decisions such as shutting down unneeded fans or capping power consumption.
Conclusion
IBM’s POWER6 based systems are aiming for general availability in mid 2007, somewhat later than previous roadmaps and public statements indicated. This would put the POWER6 initially up against Intel’s Montvale, and the joint Sun and Fujitsu APL, presumably based on the SPARC64-VI. On the x86 side, the contemporaries would be Intel’s Tigerton, and AMD’s Rev. H, both quad core designs.
IBM is claiming a factor of two performance increase, which would be consistent with the vastly higher clockspeeds and increases in raw system bandwidth. Assuming proper execution on IBM’s behalf, they will have a very strong competitive position come next year. The only real challenges to IBM’s current performance leadership would come in 2008, when both Sun’s massively multithreaded Rock and Intel’s long awaited Tukwila arrive. IBM’s roadmaps currently include the POWER6+, which is presumably a 45nm derivative product. Judging by past practices, the POWER6+ will debut in the second half of 2008, probably just in time to dash the hopes of rivals.
One thing is clear though; IBM has found a lever that they can use to defer the inexorable growth of x86 MPUs; exotic packaging and system bandwidth. It is readily apparent that commodity devices cannot afford such extravagances, and cannot match the impressive system architecture that IBM’s strategy hinges on.
* A correction was made to this article; POWER5 processors are not used in any mainframes, but are in storage products.