Speculations on CSI Servers
In the server world, CSI will be introduced in tandem with an on-die memory controller. The impact of these two modifications will be quite substantial, as they address the few remaining shortcomings in Intel’s overall server architecture and substantially increase performance. This performance improvement come from two places: the integrated memory controller will lower memory latency, while the improved interconnects for 2-4 socket servers will increase bandwidth and decrease latency.
To Intel, the launch of a broad line of CSI based systems will represent one of the best opportunities to retake server market share from AMD. New systems will use the forthcoming Nehalem microarchitecture, which is a substantially enhanced derivative of the Core microarchitecture, and features simultaneous multithreading and several other enhancements. Historically speaking, new microarchitectures tend to win the performance crown and presage market share shifts. This happened with the Athlon, the Pentium 4, Athlon64/Opteron, and the Core 2 and it seems likely this trend will continue with Nehalem. The system level performance benefits from CSI and integrated memory controllers will also eliminate Intel’s two remaining glass jaws: the older front side bus architecture and higher memory latency.
The single-processor server market is likely where CSI will have the least impact. For these entry level servers, the shared front side bus is not a substantial problem, since there is little communication compared to larger systems. Hence, the technical innovations in CSI will have relatively little impact in this market. AMD also has a much smaller presence in this market, because their advantages (which are similar to the advantages of CSI) are less pronounced. Clearly, AMD will try to make inroads into this market; if the market responds positively to AMD’s solution that may hint at future reactions to CSI.
Currently in the two socket (DP) server market, Intel enjoys a substantial performance lead for commercial workloads, such as web serving or transaction processing. Unfortunately, Intel’s systems are somewhat handicapped because they require FB-DIMMs, which use an extra 5-6 watts per DIMM and cost somewhat more than registered DDR2. This disadvantage has certainly hindered Intel in the last year, especially with customers who require lots of memory or extremely low power systems. While Intel did regain some server market share, AMD’s Opteron is still the clear choice for almost all high performance computing, where the superior system architecture provides more memory and processor communication bandwidth. This advantage has been a boon for AMD, as the HPC market is the fastest growing segment within the overall server market.
Gainestown, the first CSI based Xeon DP, will arrive in the second half of 2008, likely before any of the desktop or mobile parts. In the dual socket market, CSI will certainly be welcome and improve Intel’s line up, featuring 2x or more the bandwidth of the previous generation, but the impact will not be as pronounced as for MP systems. Intel’s dual socket platforms are actually quite competitive because the product cycles are shorter, meaning more frequent upgrades and higher bandwidth. Intel’s current Blackford and Seaburg chipsets, with dual front side buses and snoop filters, offer reasonable bandwidth, although at the cost of slightly elevated power and thermal requirements. This too shall pass; it appears that dual socket systems will shift back to DDR3, eliminating the extra ~5W penalty for each FB-DIMM . This will improve Intel’s product portfolio and put additional pressure on AMD, which is still benefitting from the FB-DIMM thermal issues. The DP server market is currently fairly close to ‘equilibrium’; AMD and Intel have split the market approximately along the traditional 80/20 lines. Consequently, the introduction of CSI systems will enhance Intel’s position, but will not spark massive shifts in market share.
The first Xeon MP to use CSI will debut in the second half of 2009, lagging behind its smaller system counterparts by an entire year. Out of all the x86 product families using CSI, Beckton will have the biggest impact, substantially improving Intel’s position in the four socket server market. Beckton will offer roughly 8-10x the bandwidth of its predecessor, dramatically improving performance. The changes in system architecture will also dramatically reduce latency, which is a key element of performance for most of the target workloads, such as transaction processing, virtualization and other mission critical applications. Since the CSI links are point-to-point, they eliminate one chip and one interconnect crossing, which will cut the latency between processors in half, or better. The integrated memory controller in Beckton will similarly reduce latency, since it also removes out an extra chip and interconnect crossing.
Intel’s platform shortcomings created a weakness that AMD exploited to gain significant market share. It is estimated that Intel currently holds as little as 50% of the market for MP servers, compared to roughly 75-80% of the overall market. When CSI-based MP platforms arrive in 2009, Intel will certainly try to bring their market share back in-line with the overall market. However, Beckton will be competing against AMD’s Sandtiger, a 45nm server product with 8-16 cores also slated for 2009. Given that little is known about the latter, it is difficult to predict the competitive landscape.
Itanium and CSI
CSI will also be used for Tukwila, a quad-core Itanium processor due in 2008. Creating a common infrastructure for Itanium and Xeon based systems has been a goal for Intel since 2003. Because, the economic and technical considerations for these two products are different, they will not be fully compatible. However, the vast majority of the two interconnects will be common between the product lines.
One goal of a common platform for Itanium and Xeon is to share (and therefore better amortize) research, development, design and validation costs, by re-using components across Intel’s entire product portfolio. Xeon and Xeon MP products ship in the tens of millions each year, compared to perhaps a million for Itanium. If the same components can be used across all product lines, the non-recurring engineering costs for Itanium will be substantially reduced. Additionally, the inventory and supply chain management for both Intel and its partners will be simplified, since some chipset components will be interchangeable.
Just as importantly, CSI and an integrated memory controller will substantially boost the performance of the Itanium family. Montvale, which will be released at the end of 2007, uses a 667MHz bus that is 128 bits wide – a total of 10.6GB/s of bandwidth. This pales in comparison to the 300GB/s that a single POWER6 processor can tap into. While bandwidth is only one factor that determines performance, a 30x difference is substantial by any measure. When Tukwila debuts in 2008, it will go a long way towards evening the playing field. Tukwila will offer 120-160GB/s between MPUs (5 CSI lanes at 4.8-6.4GT/s), and multiple integrated FB-DIMM controllers. The combination of doubling the core count, massively increasing bandwidth and reducing latency should prove compelling for Itanium customers and will likely cause a wave of upgrades and migrations similar to the one triggered by the release of Montecito in 2006.