Intel’s Sandy Bridge Microarchitecture

Pages: 1 2 3 4 5 6 7 8 9 10

System Architecture

As Moore’s Law has given more transistors with each generation, the level of achievable integration on microprocessors has steadily climbed. Intel came rather late to the integration game – it wasn’t until the 45nm Nehalem in 2008 that the memory controllers and coherent interconnects moved into the CPU silicon. However, as befitting the company that lives by Moore’s Law, they have come to the integration game with a vengeance. The 32nm client versions of Westmere packaged the microprocessor with a companion die that included graphics and a memory controller and other features; the two chips were connected via QPI. Sandy Bridge takes this to the logical conclusion and integrates almost everything found in the previous generation into a single 32nm die and eliminates the QPI link. Sandy Bridge can be divided into three major areas – the CPU cores and L3 cache, the GPU and the system agent (previously called the ‘uncore’), which holds everything else. The major blocks and external interfaces are shown below in Figure 1. The frequencies for Sandy Bridge in Figure 1 are estimates for the base clocks on a relatively high-end model. Due to the dynamic voltage and frequency scaling (DVFS) system, the CPU cores and GPU will typically operate at much higher frequencies. As Intel nears the release of products, the actual clockspeed should firm up more.

Sandy Bridge uses a new 1155-pin socket that is not compatible with the previous generation Arrandale and Clarkdale processors. One reason is that clock generation has changed. Rather than having a discrete clock generator on the motherboard, Intel’s 6-Series chipset feeds a single base clock through DMI to the processor, where it is multiplied and distributed across the die. This is another example of integration, although the clock generator is a smaller part of the system than the GPU or memory controllers.

Figure 1 – Sandy Bridge System Architecture and Comparison

Figure 1 above shows the Sandy Bridge client, compared to Westmere and one node in the first Bulldozer implementation (Interlagos). While Interlagos is solidly aimed at servers, at least one variant will appear in the high-end desktop market. It is unclear whether this version will forgo the two node MCM in favor of higher frequencies. When comparing Interlagos or Bulldozer to other designs, it is important to remember that the front-end, floating point units and L2 cache are shared between two cores in a module.

Sandy Bridge’s memory controller should be a bit of an upgrade over the previous generation. Arrandale topped out at 1.066GT/s and the desktop Clarkdale could reach 1.33GT/s. While Intel did not disclose any speeds, it seems reasonable to assume that Sandy Bridge should be able to reach at least 1.86GT/s and quite possibly 2.13GT/s for more extreme systems. On the low power front for notebooks, DDR3L (which operates at 1.35V rather than 1.5V) is now available at 1.33GT/s and will probably be an option. Otherwise, the major interfaces will remain the same.

The Sandy Bridge clients will go into production later this year, with products for 1Q11. This is substantially ahead of AMD’s competing Llano processor, which also integrates graphics on-die, but shipments are delayed till 1H11, with products likely a quarter later. Llano is still based on a microarchitecture derived from Barcelona and can be expected to lag significantly in CPU performance. Of course, given AMD’s graphics expertise, the GPU in Llano should outperform the Gen 6 graphics in Sandy Bridge. At the low-end, AMD will be shipping derivatives of Bobcat at the same time as Sandy Bridge, but they serve entirely different segments of the market.

Sandy Bridge-EP, the mainstream server version will not arrive till 2H11 – most likely in Q3. Sandy Bridge-EP will retain the same microarchitecture, but with a different system agent. Sandy Bridge-EP is expected to feature 8 cores, 16MB of L3 cache (although some rumors put this at 20MB), 4 DDR3 memory controllers, 2 QuickPath 1.1 links and 32 lanes of PCI-Express 3.0. Naturally, all the extra I/Os require a new socket – the LGA2011, which presumably adds more pins for power and ground. Consumer oriented features like the GPU and display engine will be removed to make room for additional cores, memory controllers and I/O. Sandy Bridge-EP should ship in roughly the same time frame as Interlagos, so server buyers will have the choice between two new microarchitectures. Interlagos will have up to twice as many cores as Sandy Bridge-EP, however, the actual product performance strongly depends on many unknown factors such as frequency and L3 cache design.

Pages: « Prev   1 2 3 4 5 6 7 8 9 10   Next »

Discuss (843 comments)