Even to the casual observer, it is apparent that the time of multicore computing is upon us. In fact, this shift occurred several years in the past. The first general purpose CPU to feature Chip Multi-Processing (CMP) was the IBM POWER4, which debuted in 2001. Today, there is not a high performance processor family without a shipping multicore design. Even our video game consoles are shifting in that direction. The Xbox 360 feature 3-way symmetrical CMP, while the CELL processor uses up to 8 SIMD Processing Elements. With the shift towards multicore systems, it is more important than ever to understand the additional complexities of multi-processor systems over traditional uni-processor machines.
Multicore designs bring almost all the difficulties that previously belonged to high-end MP systems to our desktops, laptops and consoles. Before the shift to CMP, shared memory system design was an esoteric art. Producing high quality MP systems was so difficult that there were multimillion dollar companies, whose sole purpose was to design, build and support large CPU count systems, using commercially available MPUs. One such example was Sequent (which was bought by IBM in 1999) and their Balance, Symmetry and NUMA-Q systems, but there were many others including Pyramid, Encore and Alliant. These companies and many others devoted their engineering resources to tackling three major problems: memory hierarchy, cache coherency and scalability. This article will cover all three topics in detail, and briefly discuss some additional considerations.