Chip Multi-Processing: A Method to the Madness

Pages: 1 2 3 4 5

Shared Interface CMP (SI-CMP)

The SI-CMP architecture is shown below, with the red path indicating the route for communication between the two cores.


Figure 3 – SI-CMP Architecture

Shared Interface Advantages

The shared interface architecture has the advantage of a simpler design compared to a shared cache, but lacks some of the flexibility and performance. Like the shared cache, communication between the two CPUs does not go outside the chip; it is routed through the logic that handles the bus or network traffic. A robustly designed bus or fabric controller should be able to simultaneously handle off-chip traffic as well as intra-chip traffic. Like the shared cache approach, the bus or network bandwidth is not used for local data or cache coherency traffic. Since the caches are separate, the cache controllers are identical to those found on single core MPUs. This means that relatively few modifications are needed to produce a shared interface chip. The interface requires some improvements, such as the ability to talk to both caches. However, the modifications are largely orthogonal to the CPU core and can be implemented later in the project life cycle, perhaps a year to a year and half prior to tape out.

Shared Interface Disadvantages

The major disadvantages of the shared interface result from the lack of integration at the cache level. Because the caches are not shared, there can be no dynamic load balancing, which can lead to wasted resources and excessive cache pressure. Secondly, as illustrated in Figure 2, the bandwidth between the bus or network interface and the cache is shared between both intra-chip traffic and off-chip traffic. Lastly, a shared interface chip has all the manufacturing disadvantages of a shared cache design.

Shared Interface MPUs

Intel’s upcoming Itanium2 code-named Montecito will use a shared interface architecture. It features two separate 12MB L3 caches, and inside the bus interface will be an arbiter that handles traffic between the cores and queues and combines the traffic across the front side bus. AMD’s dual core Opteron offerings integrate two 1MB L2 caches and use a “System Request Interface” that services intra-chip traffic. The SRI also connects to both the Hypertransport and the memory controller and similarly coordinates the external traffic.


Pages: « Prev   1 2 3 4 5   Next »

Discuss (23 comments)