At the 2005 Electronic Entertainment Expo (2005 E3) in May, Sony released the preliminary technical specifications for its next generation game console, the PS3. Sony revealed that the CELL processor will run at 3.2 GHz in the production version of the PS3, and it will be coupled to 256 MB of XDR memory operating at 3.2 Gbps per differential data pin pair. Moreover, the PS3 will use the CELL processor in its present configuration with 8 SPE’s as announced at the 2005 ISSCC, but only 7 of the 8 SPE’s in the CELL processor will be functional. As a result, even nominally defective devices can be shipped with one defective SPE disabled. The new performance claim by Sony is that the CELL processor in this configuration can achieve a rating of 218 GFlops at 3.2 GHz. However, after considerable effort investigating the throughput rating of 218 GFlops at 3.2 GHz, this author believes that this rating is in fact erroneous. Based on available information, this author believes that the current version of the CELL processor with only 1 PPE and 7 active SPE’s is simply not capable of reaching the rating of 218 GFlops at 3.2 GHz.
The CELL processor has been touted as a processor that is capable of producing a large number of floating point operations per second. At ISSCC 2005, IBM made the claim that the CELL processor is capable of achieving a peak throughput of 256 billion single precision (SP) floating point operations per second. At HPCA 2005, IBM further revealed that the CELL processor is capable of achieving a peak throughput of greater than 26 billion double precision (DP) floating point operations per second. A quick glance at the microarchitecture of the CELL processor reveals that the SPE’s are capable of performing 4 (non IEEE754 compliant) SP floating point multiple-add (FMADD) operations per cycle or 2 (IEEE754) DP FMADD operations every 7 cycles. Consequently, the 8 SPE’s alone can achieve the 256 SP GFlops rating at 4 GHz without the aid of the PPE. Presumably, the (DD2) PPE can also produce 4 SP FMADD’s per cycle, and the (DD2) CELL processor should instead be rated as 288 Gflops at 4 GHz when the compute power of the PPE are taken into consideration. Similarly, the 2 DP floating point multiply-add operations every 7 cycles results in 18.3 DP GFlops per second for the 8 SPE’s at 4 GHz, and the PPE can sustain a peak throughput of 1 DP FMADD operation per cycle, producing 8 DP GFlops at 4 GHz. The total of 26.3 GFlops matches nicely with IBM’s claim of > 26 DP GFlops.
The floating point throughput ratings claimed by IBM are readily verifiable, as illustrated in the computations above. However, a similar statement cannot be made about the 218 GFlops rating claimed by Sony in its press release. Specifically, to reach a rating of 217.6 GFlops at 3.2 GHz, the CELL processor must execute 68 floating point operations per cycle (recall that a single FMADD instruction counts as two floating point operations, both a multiple and an add). Assuming that the single PPE and each SPE’s can produce 4 SP FMADD operations per cycle, it is trivial to see how 64 of the 68 floating point operations per cycle can be obtained. Unfortunately, the remaining flops cannot be accounted for. One possible explanation of these missing flops is that Sony counts FP loads and stores as floating point operations. However, this theory is difficult to support since the practice of counting load and store instructions as floating point operations is atypical. In any case, the CELL processor would likely have a rating much higher than 218 Gflops at 3.2 GHz if loads and stores were counted. A second theory suggests that the 218 Gflops rating is based on the usage of the (DD1) CELL processor with 8 active SPE’s. In such a case, the (DD1) PPE can sustain a throughput of 2 SP FMADD operations per cycle, equivalent to 4 Flops, and each of the SPE’s can sustain a throughput of 4 SP FMADD operations per cycle. Collectively, the DD1 CELL processor with 8 SPE’s can readily produce 68 floating operations per cycle and matches the claim of 218 GFlops at 3.2 GHz nicely. Presently, Sony has declined to elaborate on the technical details of the CELL processor or to explain how it attains 218 GFlops at 3.2 GHz with only 7 active SPE’s.