The most obviously improved part of the X3 is the memory hierarchy. The prior generation EXA2 supported both Itanium and Xeon processors. Due to physical differences in the front-side buses, this required a two-chip architecture, as depicted in Figure 3; one chip handling the front side-bus (FSB), the other containing memory and I/O controllers. Unfortunately, this introduced an extra chip to chip interconnect and the associated latency between the MPUs and the I/O and memory, reducing performance. To mitigate this penalty, an L4 cache was added, connecting to the FSB controller.
In a move mirroring the revisions to the pSeries system architecture, the X3 eliminated the extraneous chip interconnects and the chipset functionality has mostly been consolidated onto a single chip. Consequently, memory accesses require one less interconnect hop, reducing latency substantially. This also allowed IBM to drop the L4 cache from their design altogether, further decreasing costs and design complexity. As a result, local memory latency for the X3 is a blistering 108ns, compared to 265ns for the prior generation. These changes can be seen below in Figure 4.
Figure 4 – EXA2 vs. X3 Memory Subsystem Comparison, based on an IBM presentation
The previously mentioned chip consolidation also eliminated a key bottleneck: the connection between the memory controller and the scalability controller. In the EXA2, this interconnect could only deliver 3.2GB/s of bandwidth, which reduced the efficacy of the eight channel DDR memory architecture, which could theoretically provide 12.8GB/s of bandwidth. In comparison, the x3 is able to reap the full benefits of four DDR2-400 controllers, which deliver 21.3GB/s of bandwidth across 8 channels. Note that this is considerably more than the front-side buses can deliver, so roughly half of the memory bandwidth is available exclusively for the I/O devices and remote quads.
IBM also provided a full set of RAS features for the X3 memory subsystem. The memory controller supports hot-swapping, scrubbing, bit steering and mirroring. The first three features have been rather elegantly explained by David Wang in his article Error Correcting Memory – Part I, which I highly recommend. Memory mirroring is rather simple; it is simply duplicating the contents of memory, an analog to RAID-1 for hard drives. This has the unfortunate side-effect of halving memory capacity and increasing costs.