The Sandy Bridge system agent contains everything outside the CPU cores, L3 cache and graphics. In previous generations, it was described as the ‘uncore’, which does little to indicate the function and also seems to have a slightly negative connotation. AMD’s terminology for previous designs was the ‘northbridge’, while IBM calls it the ‘nest’. Hopefully, the terminology will synchronize to avoid the sort of rampant confusion caused by linguistic differences.
The system agent for client variants of Sandy Bridge incorporates the memory controller, power control unit (PCU), PCI-Express 2.0, DMI and the display engine. The system agent sits on a fixed voltage and frequency plane and is connected to the ring interconnect, giving it a high bandwidth interface to the rest of the system. Rather than exhaustively describe the system agent, we will focus on the PCU, with a brief mention of the display engine.
One of the most important parts of the system agent is the Power Control Unit (PCU). The PCU is a patchable microcontroller responsible for chip-wide power and thermal management. That encompasses “Turbo-mode”, the dynamic voltage and frequency scaling (DVFS) for the cores/caches and graphics, as well as the memory and I/O interfaces. Nehalem features an aggressive DVFS that iteratively measures voltage, power consumption, temperature and other environmental factors. These measurements are used to adjust the voltage and frequency for the cores, using available thermal and power headroom to increase frequency and single threaded performance. In conjunction, Nehalem uses PFET based power gates to eliminate leakage for idle cores, thereby freeing up more headroom for frequency increases.
The DVFS for Sandy Bridge is more aggressive than its predecessor and shares some characteristics with Moorestown. In reality, the temperature of a chip, heat spreader and the heat sink lags behind the power dissipated – when a chip goes from idle to fully active, temperature rises over a period of time, rather than instantaneously. This is critical, since a cold heat sink actually absorbs more power than a warm one and leakage also increases with junction temperature.
The Sandy Bridge PCU uses a more sophisticated model of the chip that factors in this dynamic thermal capacitance, rather than assuming an instant temperature change (as the PCU for Nehalem and Westmere does). During this period of time where the heat sink temperature is low but rising, it will absorb heat faster from the chip. Sandy Bridge can take advantage of the extra heat transfer to safely dissipate more power than the sustained TDP limit. Depending on the system conditions, Sandy Bridge can exceed the TDP for up to 25 seconds before falling back to a sustainable power level. This capability is particularly useful for workloads with highly varied power usage, as found in client systems. It will also undoubtedly be exploited by savvy overclockers with exotic heat sinks, although this may require special BIOS support to extend the 25 second window. It is very likely that Moorestown uses similar algorithms, given that it seems to have a similar ability to exceed TDP briefly and is exclusively focused on workloads where the chip is idle most of the time, interspersed with short activity bursts.
This new capability relies on precise thermal modeling algorithms in the PCU. These more refined thermal estimates also enhance the number of extra frequency bins available with multiple active cores. In previous products, the frequency limits for a single active core were often much higher than scenarios with two or four cores active. For example, the Core i5 750 has a base clock of 2.66GHz and can reach 2.8GHz (+5%) with 3-4 cores or 3.2GHz (+20%) with 1-2 cores active. While the actual binning for Turbo-mode is a product level decision, Sandy Bridge should improve flexibility for some models.
Previously, the GPU was on a separate die and was power managed by the driver. Since Sandy Bridge integrates graphics and CPUs into a single chip, the PCU can flexibly manage the power and thermal budget between the two components with much higher precision and lower latency. Sandy Bridge can provide higher performance for most applications, by sharing thermal and power resources between the graphics and CPUs, rather than statically allocating a power and thermal budget. For example, when running a CPU-intensive workload where graphics is lightly used, the PCU can allocate more power for the CPU cores, which then translates into higher frequency and performance. In high-end products, reportedly each core can reach 3.8GHz, while the GPU can hit 1.35GHz.
Each core in Sandy Bridge can be power gated off. Since the graphics has its own power plane, it can be controlled with traditional techniques, rather than using on-die power gating. The cache and ring interconnect cannot be shut down or power gated, since they are shared by all components. However, Intel’s L3 caches are designed with sleep transistors and other techniques to reduce active and idle power. It is also likely there are ways to reduce the idle power of the ring interconnect when it is not in use, but power gating was not considered an attractive option for Sandy Bridge. Naturally, the system agent must always stay active, since it includes the PCU, and it also receives the base clock over DMI.
The display and media engine in Sandy Bridge has also been substantially reworked. High definition MPEG2, VC1 or AVC streams are decoded in hardware (2 simultaneous streams can be decoded). Unlike previous generations, decoding does not involve the GPU. The fixed function decoding block is more power efficient than the programmable GPU, and also avoids sending data across the chip and waking up the GPU – reportedly achieving a 2X decrease in power consumption for playback. Encoding re-uses many of the decoding fixed function blocks and works with the GPU shader cores through the L3 cache. Demonstrations at IDF suggest a 2X performance gain, although further testing should show a more complete picture. Intel has also release a software development kit so that programmers will be able to take advantage of these new features.