CELL Microprocessor Revisited

Pages: 1 2 3 4 5


Details on the CELL processor, designed by the collective efforts of Sony, Toshiba and IBM (STI), were previously disclosed at ISSCC 2005. The previous article provided coverage of the hardware details of the CELL processor, based on the information made available at ISSCC 2005. The purpose of this article is to act as a supplement to the prior article, provide clarification on points of confusion in the first article, and summarize the information on the CELL processor that was made available subsequent to ISSCC 2005. In particular, further details on the CELL processor were made available by Dr. Peter Hofstee from IBM at the Eleventh International Symposium on High Performance Computer Architecture (HPCA).

Pro or Con?

The purpose of these articles on the CELL processor is to provide information to interested parties with the necessary knowledge base to independently comprehend and evaluate the subtleties of CELL’s design philosophy, not to promote or denigrate the CELL processor. This writer’s interest in the CELL processor is purely academic. That is, the CELL processor is interesting because it is different, not because the architecture is particularly new or novel. The architectural concepts contained in the CELL processor are hardly new, nor is the basic idea of shifting the burden of computing from hardware to software a newly discovered concept. As a result, the previous article attempted to balance the descriptions of the hardware capabilities of the CELL processor with sentiments that pointed out the non-trivial issue of CELL’s unconventional programming model. The success or failure of the CELL processor to make inroads into desktop or high performance processing tasks rests not on the CELL processor’s capability to deliver 0.25 TFlops, 25 TFlops or 25 BIPs, but on the ability of the software stack to extract that performance with nominal effort by the programmers. Since no details of the programming model have been published by STI, it is currently not possible to assess the relative ease or difficulty in extracting the promised performance from the CELL processor. Hopefully, STI will soon release some or all of the details for CELL’s programming model, so that the performance of the CELL processor can be placed in proper context.

Rambus XDR Memory System: Limited to 4 Devices, or Not?

In the question and answer section of Dr. Hofstee’s presentation of the CELL processor at HPCA, the question was raised that some articles have claimed that the CELL processor is limited to 64 or 128 MB of memory. While this supposed limitation may not be critical for a processor destined for a game machine, a limitation to 64 or 256 MB of memory would indeed pose severe constraint on those who are interested to use the CELL processor in applications outside of the game console. In that sense, this constraint seems irrational, since the CELL processor fully supports the prerequisites necessary for the CELL processor to expand into markets outside of the gaming console: ECC memory and the possibility of a large cache coherent system. Dr. Hofstee dismissed the questions regarding the memory capacity constraint, and insisted that IBM will release more details on the memory system in the future.

In the previous article, the memory capacity of the XDR memory system was described as follows:

In the XDR memory system, each channel can support a maximum of thirty-six devices connected to the same command and address bus. The data bus of each device connects to the memory controller through a set of bi-directional point-to-point connections. In the XDR memory system, address and command are sent on the address and command bus at a rate of 800 Mbits per second (Mbps), and the point to point interface operates at a datarate of 3.2 Gbps. Using DRAM devices with 16 bit wide data busses, each channel of XDR memory can sustain a maximum bandwidth of 102.4 Gbps (2 x 16 x 3.2), or 12.6 GB/s. The CELL processor can thus achieve a maximum bandwidth of 25.2 GB/s with a 2 channel, 4 device configuration.

The obvious advantage of the XDR memory system is the bandwidth that it provides to the CELL processor. However, in the configuration illustrated in figure 9, the maximum of 4 DRAM devices means that the CELL processor is limited to 256 MB of memory, given that the highest capacity XDR DRAM device is currently 512 Mbits. Fortunately, XDR DRAM devices could in theory be reconfigured in such a way so that upwards of 36 XDR devices can be connected to the same 36 bit wide channel and provide 1 bit wide data bus each to the 36 bit wide point-to-point interconnect. In such a configuration, a two channel XDR memory can support upwards of 16 GB of ECC protected memory with 256 Mbit DRAM devices or 32 GB of ECC protected memory with 512 Mbit DRAM devices.

Unfortunately, rather than providing insights into the issue of the CELL processor’s memory capacity, the statements above may have in fact contributed to the impression that the XDR memory system is currently constrained in some way. The truth of the matter is that the XDR DRAM devices themselves are capable of supporting the 72 device configuration that would allow each CELL processor to directly address 4 GB of ECC protected memory (given 512 Mbit XDR devices). However, in order to support the XDR DRAM device in such a configuration, specific support must be built into the XDR DRAM controller interface. To date, IBM has not released details on the memory controller interface indicating whether the current incarnation of the CELL processor can support a 72 DRAM device configuration in the XDR memory system, or a less amount, i.e. 36 DRAM devices. Fortunately, regardless of whether the current CELL processor can explicitly support the 72 DRAM device configuration in the XDR memory system, the ability to address 72 XDR DRAM devices would require at most a relatively minor design change for the CELL processor. As a result, even if a DRAM device-count limitation exists for the current incarnation of the CELL processor, future CELL processors can be easily designed to rectify that limitation.

Pages:   1 2 3 4 5  Next »

Discuss (5 comments)