Rambus XDR Memory System
Figure 9 – The two channel XDR Memory System
To provide machine balance and support the peak rating of more than 256 SP GFlops (or 25~30 DP GFlops), the CELL processor requires an enormously capable memory system. For that purpose, two channels of Rambus XDR memory is used to obtain 25.2 GB/s of memory bandwidth.
In the XDR memory system, each channel can support a maximum of thirty-six devices connected to the same command and address bus. The data bus of each device connects to the memory controller through a set of bi-directional point-to-point connections. In the XDR memory system, address and command are sent on the address and command bus at a rate of 800 Mbits per second (Mbps), and the point to point interface operates at a datarate of 3.2 Gbps. Using DRAM devices with 16 bit wide data busses, each channel of XDR memory can sustain a maximum bandwidth of 102.4 Gbps (2 x 16 x 3.2), or 12.6 GB/s. The CELL processor can thus achieve a maximum bandwidth of 25.2 GB/s with a 2 channel, 4 device configuration.
The obvious advantage of the XDR memory system is the bandwidth that it provides to the CELL processor. However, in the configuration illustrated in figure 9, the maximum of 4 DRAM devices means that the CELL processor is limited to 256 MB of memory, given that the highest capacity XDR DRAM device is currently 512 Mbits. Fortunately, XDR DRAM devices could in theory be reconfigured in such a way so that upwards of 36 XDR devices can be connected to the same 36 bit wide channel and provide 1 bit wide data bus each to the 36 bit wide point-to-point interconnect. In such a configuration, a two channel XDR memory can support upwards of 2 GB of ECC protected memory with 256 Mbit DRAM devices or 4 GB of ECC protected memory with 512 Mbit DRAM devices. As a result, the CELL processor could in theory address a large amount of memory if the price premium of XDR DRAM devices can be minimized. One intriguing note reported by Dave Bursky of Electronic Design Magazine is that the XDR memory system makes use of 72 pairs of differential signals for the data bus. The figure seventy-two implies that the CELL processor does indeed support ECC. Since ECC support is clearly not a requirement of a processor to be used in a game machine, the presence of ECC support, if confirmed, would clearly indicate IBM’s ambition to promote the use of CELL processors for serious computational applications outside of the application domain of the Sony Playstation.
Incidentally, Toshiba is a manufacturer of XDR DRAM devices. Presumably it brought the XDR memory controller and memory system design expertise to the table, and could ramp up production of XDR DRAM devices as needed.
FlexIO System InterfaceAt ISSCC 2005, Rambus presented a paper on the FlexIO interface used on the CELL processor. However, the presentation was limited to describing the physical layer interconnect. Specifically, the difficulties of implementing the Redwood Rambus ASIC Cell on IBM’s 90nm SOI process were examined in some detail. While circuit level issues regarding the challenges of designing high speed I/O interfaces on an SOI based process are in their own right extremely intriguing topics, the focus of this article is geared toward the architectural implications of the high bandwidth interface. As a result, the circuit level details will not be covered here. Interested readers are encouraged to seek out details on Rambus’s Redwood technology separately.
What is known about the system interface of the CELL processor is that the FlexIO consists of 12 byte lanes. Each byte lane is a set of 8 bit wide, source synchronous, unidirectional, point-to-point interconnects. The FlexIO makes use of 96 differential signaling pairs to achieve the data rate of 6.4 Gb per second per signal pair, and that data rate in turn translates to 6.4 GB/s per byte lane. The 12 byte lanes are asymmetric in configuration. That is, 7 byte lanes are outbound from the CELL processor, while 5 byte lanes are inbound to the CELL processor. The 12 byte lanes thus provide 44.8 GB/s of raw outbound bandwidth and 32 GB/s of raw inbound bandwidth for total I/O bandwidth of 76.8 GB/s. Furthermore, the byte lanes are arranged into two groups of ports: one group of ports are dedicated to non-coherent off-chip traffic, while the other group of ports are usable for coherent off-chip traffic. It seems clear that Sony itself is unlikely to make use of a coherent, multiple CELL processor configuration for Playstation 3. However, the fact that the PPE and the SPE’s can snoop traffic transported through the EIB, and that coherency traffic can be sent to other CELL processors via a coherent interface, means that the CELL processor can indeed be an interesting processor. If nothing else, the CELL processor should enable startups that propose to build FlexIO based coherency switches to garner immediate interest from venture capitalists.
Discuss (6 comments)