When the P6 front side bus was first released, it caused a substantial shift in the computer industry by supporting up to four processors without any chipset modifications. As a result, Intel based systems using Linux or Windows penetrated and dominated the workstation and entry level server market, largely because the existing architectures were priced vastly higher.
However, Intel hesitated to extend itself beyond that point. This hesitancy was partially due to economic incentives to maintain the same infrastructure, but also the preferences of key OEMs such as IBM, HP and others, who provide value added in the form of larger multiprocessor systems. Balancing all the different priorities inside of Intel, and pleasing partners is nearly impossible and has handicapped Intel for the past several years. However, it is quite clear that any reservations at Intel disappeared around 2002-3, when CSI development started.
Intel’s patents clearly anticipate two and four processor systems, as shown in Figure 6. Each processor in a dual socket system will require a single coherent full width CSI link, with one or two half-width links to connect to I/O bridges, making the system fully symmetric (half-width links are shown as dotted lines). Processors in four socket systems will be fully connected, and each processor could also connect directly to the I/O bridge. More likely, each processor, or pair of processors, could connect to a separate I/O bridge to provide higher I/O bandwidth in the four socket systems.
Figure 6 – 2 and 4P CSI System Diagrams  
Fully interconnected systems, such as those shown in Figure 6 enjoy several advantages over partially connected solutions. First of all, transactions occur at the speed of the slowest participant. Hence, a system where every caching agent (including the I/O bridge) is only one hop away ensures lower transaction latency. Secondly, by lowering transaction latency, the number of transactions in flight is reduced (since the average transaction life time is shorter). This means that the buffers for each caching agent can be smaller, faster and more power efficient. Lastly, operating systems and applications have trouble handling NUMA optimizations, so more symmetrical systems are ideal from a software perspective.
Interacting with I/O
Of course, ensuring optimal communication between multiple processors is just one part of system design. The I/O architecture for Intel’s platform is also important, and CSI brings along several important changes in that area as well .
As Figure 6 indicates, some CSI based systems contain multiple I/O hubs, which need to communicate with each other. Since the I/O hubs are not connected, Intel’s engineers devised an efficient method to forward I/O transactions (typically PCI-Express) through CSI. Because CSI was optimized for coherent traffic, it lacks many of the features which PCI-Express relies upon, such as I/O specific packet attributes. To solve this problem, PCI-E packets are tunneled through CSI, leaving much or all of the PCI-E header information intact.