DD1, DD2, PPC970FX, and Apple’s Role?
Figure 3 – DD1 PPE compared to DD2 PPE and PPC970 core
As stated previously, the PPE in the CELL processor has been re-engineered and increased substantially in size. Figure 3 shows an equivalently scaled comparison of the PPE in DD1 and DD2 against the PPC970FX core (the L2 cache and bus interface for the PPC970FX have been removed in Figure 3 to enable a fair comparison to the DD1 and DD2 PPE’s). Figure 3 shows that although the DD2 PPE grew significantly in die size, it is still smaller than the PPC970FX core on the same 90nm process. The complete re-engineering of the PPE for the CELL processor is interesting and surprising because the PPE is relegated to a secondary role of task scheduling, while the SPEs perform actual computational work in the programming model presented by Toshiba and Sony. Given this arrangement, it is difficult to comprehend the necessity of the larger and more robust PPE in the DD2 CELL processor.
One possible theory regarding the enhancement of the PPE in the DD2 CELL processor is that it may have been a last ditch effort by IBM to convince Apple to migrate to the CELL processor for future generations of Macintosh computers. The reasoning for Apple’s possible influence in the development of the DD2 PPE is as follows:
The PPC970 heavily leveraged the POWER4 design database to minimize the overhead costs. However, the PPC970xx processors is currently limited to approximately 2 million units per year, and the prospect of significantly increasing the volume appears to be poor. In the future, IBM can continue to minimize the overhead of this line of processors by creating products such as the dual core PPC970MP, but no new cores would be forth coming unless the volume or the pricing for this line of processors can be further increased to justify the development costs. Given the recriminations that have surfaced since Apple announced its intention to shift to Intel based processors, hindsight suggests that IBM may have been trying to reduce the development overhead by eliminating the PPC970xx line of processors and moving Apple to either the CELL processor or the Xenon processor.
Assuming that Apple was given the choice to migrate to either the CELL processor or its cousin, the Xenon processor, Apple would have assuredly objected to a next generation processor that has significantly lower single-threaded scalar performance than the previous generation processor. Although the emulation required for the m68K to PPC and a PPC to x86 transitions would introduce similar issues, once applications have been recompiled for the new ISA, the new machines would have far better performance or power usage. However, it is vastly more difficult to improve performance for most software when moving from a heavy weight single threaded processor to a light(er) weight multithreaded programming platform. The simple fact is that despite advances in software technology, single threaded performance is still vital for personal desktop and notebook computers, and it is uncertain whether the highly threaded (and possibly asymmetric) software tool chain can achieve high performance for general classes of applications. In this scenario, it is likely that the DD2 PPE was enhanced to make the CELL processor more palatable to Apple, and that this enhancement accounts for the difference in die area.
Finally, despite the supposition that the DD2 PPE is a design change driven by Apple, the possibility remains that the DD2 was also needed to augment the PPE’s ability to perform the real-time scheduling required by the envisioned programming model. However, it is difficult to believe that STI could have blundered so badly in the microarchitectural definition phase and design of the DD1 PPE as to render it unable to meet the real-time scheduling requirements of the CELL processor. Moreover, a closer inspection of the DD1 and DD2 PPE photo micrographs reveals that the DD2 PPE contains more SIMD/vector execution resources than the DD1 PPE. Presumably, these additional vector resources are not required for real-time scheduling, but would be required to sustain peak throughput for carefully tuned Altivec code. The presence of the additional vector resources in the DD2 PPE thus bolsters the argument that the larger and more robust DD2 PPE was driven by considerations other than the real time scheduling requirements. Certainly the answer may be a combination of both, but the scheduling of the DD2 tapeout and Apple’s subsequent timing in announcing the transition to Intel processors suggests that a desire on IBM’s part to get Apple to adopt the CELL processor may have had a role in the enhancement of the PPE in the DD2 CELL processor.