Shared Package CMP (SP-CMP)
The shared package design is shown in Figure 4, along with the intra-core communication in red. Sometimes, this is called a Dual Chip Module (DCM), or a Multi Chip Module (MCM, for more than 2 chips). While the network for the CPUs is not shown, it must be accessed for the cores to share data; this network could be a fully routed and switched network, or simply a shared bus using an external chipset for arbitration.
Figure 4 – SP-CMP Architecture
Shared Package Advantages
There are three advantages here, but it is easy to overlook their importance. The first and most important is that putting two dice together in a package is less complex, and may not require modifications to the CPU logic. Consequently, this approach takes less design time (relative to the other two) and can be implemented much later in the project life-cycle. The second advantage is that since each die is tested and verified before it is packaged, the yield on shared packages should be close to those of a single CPU. Lastly, shared package MPUs can be designed to have lower TDPs than the equivalent shared cache or interface MPU. This is done by using one low power/leakage chip and one high power/leakage chip in the package. The TDP for such a shared package would be twice the average TDP of a single chip, whereas the two other architectures would have TDPs that are twice the highest TDP of a single chip. Similarly, the power requirements would be lower for a shared package. These last two advantages help to counter the increasing power and temperature variation that occur for advanced processes.
Shared Package Disadvantages
The biggest disadvantage of the shared package architecture is that the communication between the two CPUs is just as slow as communication between two sockets in an Symmetric Multi-Processor (SMP) approach. The lack of integration increases the data passed across the external CPU interface, unlike the other two approaches. Another disadvantage is that some implementations put two electrical loads on the bus or interconnect. In general, this limits the frequency at which the interface may clock (in the case of a multidrop bus), or require extra hops and routing (in the case of point to point connections). However, some designs use a buffer that coalesces requests from both CPUs into a single location, so that each package appears as a single electrical load.
Shared Package MPUs
Currently shipping MPUs that use a shared package include Intel’s Pentium D code named Smithfield, the next generation Pentium D code named Presler, and the upcoming Xeon code named Dempsey. Smithfield is different because it uses the shared package approach, but loses the advantage of better yields, since it is implemented as a single die. It is likely that such a design was simply a one-off occurrence, as it negates the biggest advantage of a shared package design.
Given Intel’s recent public roadmap disclosures, it does not appear that Intel’s P4P micro-architecture will implement anything other than a shared package approach. The main reason for the shared package was probably due to schedule constraints to enable a competitive, time-to-market product while enabling the manufacturing to have the best possible yields and speed or power bins. The more server oriented designs (Paxville MP and likely the upcoming Tulsa) will use request buffers so that they each appear as a single electrical load, allowing for faster front side buses. Otherwise, the cores would likely be bandwidth starved.
Discuss (23 comments)