Dual Core Servers, Done Right
Intel’s shared bus architecture has several advantages; it is simple, easy to implement, cheap, and it is quite effective for two processor systems. However, when you have four CPUs on a single shared bus, there is not a lot of bandwidth to go around and there is an increase in cache coherency traffic to boot. All together, this makes Intel’s shared bus a suboptimal solution for four processors or more. Fortunately, Intel has gone to great lengths to address these issues in Bensley. Figure 1 below shows a comparison of the Lindenhurst platform to the Bensley platform. The initial focus will be on the buses; the memory subsystem will be discussed later.
Figure 1 – Lindenhurst and Bensley platforms
Note that Lindenhurst has a 800MT/s bus, which provides 6.4GB/s of bandwidth. That’s certainly not too bad for two processors, however, with Paxville DP, that’s only 1.6GB/s of bandwidth for each CPU! On top of that, there is also the broadcast cache coherency traffic. In general, cache coherency traffic is proportional to the square of the number of processors in a system. All together, Lindenhurst is a fine chipset for single core MPUs, but definitely not great for dual core designs.
The Blackford chipset is far more reminiscent of IBM’s X3 than Lindenhurst. Blackford uses two independent 1066MT/s buses (which will scale to 1333MT/s with the introduction of Woodcrest), and provides an aggregate of 17.1GB/s of bandwidth, 21.3GB/s with the upgraded bus. That is 4.3GB/s for each CPU and 5.3GB/s, when Woodcrest is introduced. In terms of processor bandwidth, Bensley is a huge step forward; but not only that, Bensley also packs an innovative way to deal with cache coherency, which is described in the next section.