Memory Subsystem and FB-DIMMs
Another significant improvement in the Bensley platform is the Fully Buffered-DIMM memory controller. FB-DIMMs are a new standard for memory, specifically targeted for servers and workstations. FB-DIMMs have been extensively described, and are fairly well known. To summarize, each FB-DIMM uses standard DRAMs, but with buffers on each DIMM. A 3.2GHz point-to-point uni-directional serial interconnect is routed between the memory controller and the buffers.
Figure 2 – Sustained bandwidth for FB-DIMMs (from an Intel presentation)
A single FB-DIMM channel requires 69 pins compared to 240 for DDR2; a little less than a third the pins, while providing the same bandwidth (4.264GB/s). Since FB-DIMMs are far higher bandwidth/pin than DDR2, and they support uneven trace lengths, the routing is much easier and more channels can be used in standard 4 or 6 layer motherboards. Blackford uses 4 channels, supplying 17.1GB/s of memory bandwidth, just enough to saturate the two front side buses. As Figure 2 indicates, FB-DIMMs can simultaneously write and read, but only from different DIMMs, so there is a very real bandwidth advantage to having multiple DIMMs on each channel. Each channel can work with up to 244 devices, four times as many as DDR2. Blackford supports DRAMs ranging in size from 256Mbits to 2Gbits, so using 2Gbit DRAMs, a system can have 64GB of memory.
Blackford also improves upon the RAS capabilities of the prior generation chipset. CRC is used to detect errors in address or commands from the memory controller. The data is protected with SECDED ECC, and memory mirroring. Mirroring is a new feature to the DP market, and perhaps more importantly, with Blackford, there is only a small performance impact. Previous implementations of mirroring often made significant bandwidth and latency sacrifices for the increased availability. The memory controller can both detect and seamlessly correct failures of x4 and x8 DRAMs, the prior generation Lindenhurst chipset only supported x4 correction. Memory scrubbing and logging are also implemented, but that is hardly news; Lindenhurst also supported scrubbing to detect single bit errors before they can become uncorrectable double bit errors. The Blackford controller also allows a DIMM to be designated as a hot-spare that is used as a failover mechanism.
While FB-DIMMs are a vast improvement over DDR2, they also represent a trade-off. The unloaded latency for FB-DIMMs is worse than for DDR2, Figure 2 shows a disadvantage of around 10-20ns depending on the configuration. The latency to access any DIMM in a channel is determined by the latency of the last DIMM, so high capacity configurations will have slightly worse latency than those with only 1 or 2 DIMMs. However, this is nothing new. For all servers, more RAM means slower access; many current DP systems offer up to 16GB of memory, but only at the slowest speed grades. Moreover, unloaded latency is not that important; after all, people buy servers to run at 30-80% capacity, not unloaded. The real question is what does loaded latency look like? Now, the chart from Figure 2 is not quite gospel, but it does show that for applications which need more than 4GB/s of bandwidth, FB-DIMMs tends to have better latency.
The other drawback of FB-DIMMs is that they do add to the overall power consumption and thermal dissipation of the system. The buffer chips themselves are reported to dissipate around 3-7W, and with 4-16 DIMMs in a system and that could mean an extra 12-112W. However, that is for the first generation of buffers, which are mostly produced on older processes. As the buffers are moved to newer processes and as the manufacturers gain more experience, the power consumption and heat dissipation will decrease. This is a rather typical first generation teething problem and not a serious issue.
The Blackford chipset is the first to implement FB-DIMMs. The architects deliberately traded unloaded latency, power consumption and heat dissipation for better loaded latency, bandwidth and capacity. This is a very appealing trade-off for servers, where the main focus is high throughput at reasonable latencies. The additional bandwidth and capacity are essential for the next generation servers to scale up, since they are using multi-core MPUs. Ultimately, the choice of FB-DIMMs for the Blackford chipset will prove to be a wise one.