Floating Point Re-visited
The issue of DP FP capability was briefly re-visited at HPCA. It was revealed that DP FP instructions can be issued every 6 or 7 cycles, and the estimate for peak DP FP throughput given at HPCA by IBM is that the CELL processor is capable of 26 GFlops, including the DP FP throughput of the PPE. While the estimate of 26 GFlops given by IBM is well within the range of the 25~30 GFlops estimated in the previous article, it is still worth noting that the DP FP capability of the SPE was slightly less than previously believed, and the overall DP FP throughput trended toward the lower end of the previous estimate.
Power Consumption Re-visited
The previous article estimated that the power consumption of the CELL processor was in the range of 50~80W at 1.1V. This range of power consumption was given in an informal setting with no specific sources of attribution. However, a note of caution should be given that the power consumption of the PPE cannot be derived from a simple subtraction of the 4 Watts each of 8 SPE’s from the 50~80W figure. Specifically, there are several caveats.
- The CELL processor consists of the PPE, the L2 cache, 8 SPE’s, the FlexIO interface, the XDR interface, the Memory Interface Controller, the BEI, and the EIB.
- The power consumption of the L2 cache, the MIC, the EIB and the BEI is non-zero.
- The FlexIO power consumption has been reported to be 1.03W per byte interface. With 7 Tx byte lanes and 5 Rx byte lanes, the FlexIO interface alone consumes approximately 12.4W.
- The data bus of the XDR interface uses the same circuit as the FlexIO interface, but runs at half of the datarate. With the 16 byte wide data bus interface @ 3.2 Gbps and the 12b address/command bus @ 800 Mbps, the XDR interface should consume approximately 9W.
- Finally, the schmoo plot given in the presentation of the SPE are not exacting figures. That is, the power consumption of the SPE is not exactly 4.0W @ 1.1V @ 4 GHz. During the Q&A session for the SPE paper presentation (paper 7.4), it was revealed that the schmoo plot for the SPE was constructed from lab notes and presented to ISSCC attendees as the approximate operating ranges of the SPE, not specific characterisations of an engineering sample for the purposes of creating a datasheet. Astute readers will further note that the wattage of the SPE’s do not contain any significant figures, and 4W may in fact be 4.49W, assuming that proper rounding was used.
Life Imitating Art Imitating Life
One strategy that was used to write the first CELL article is that available graphics from the various presentations on the CELL processor given at ISSCC 2005 were evaluated for technical content and their respective effectiveness in conveying fundamental concepts. Interesting diagrams or photographs from the presentations were added to the article by extracting the image from the acrobat files enclosed with the ISSCC 2005 CD. However, in some cases, diagrams extracted from the acrobat files were not very clear, and the diagrams were re-drawn by yours truly. One such diagram is the diagram of SPE’s internal organization.
Figure 5 – Figure 7.4.1 from IBM’s paper on SPE
The SPE’s block diagram was included as “figure 7.4.1” for paper 7.4, “a streaming processing unit for a CELL processor”.
Figure 6 – Figure 4 from the first CELL article
Due to the fact that the font size of the diagram was small and difficult to read, Figure 7.4.1 was re-drawn, then illustrated as figure 4 in the article “ISSCC 2005: The CELL Microprocessor”.
At HPCA 11, Dr. Peter Hofstee again presented details about the CELL processor, and a very familiar diagram was shown to illustrate the internal organization of the SPE.
Figure 7 – Slide 23 in IBM’s CELL presentation at HPCA
As seen above, the slide shown by Dr. Peter Hofstee to illustrate the internal structure of SPE bears a remarkable resemblance to figure 4 from “ISSCC 2005: The CELL Microprocessor”, rather than figure 7.4.1 of IBM’s own “a streaming processing unit for a CELL processor”.
The remarkable issue is that the “ISSCC 2005: The CELL Microprocessor” article was first made available on Thursday, February 10, 2005, and Dr. Hofstee gave the slide presentation at HPCA on Tuesday, February 15. Somehow, figure 4 and slide 23 converged in the days between February 10 and February 15.
Discuss (5 comments)