Theoretically, the only advantages that discrete GPUs have over their integrated brethren are dedicated memory and a vastly larger power budget. Power consumption is the limiting factor for modern CPUs and GPUs, especially for notebooks, tablets, phones and other mobile devices. This means that increasing performance is directly tied to reducing power consumption.
GPUs are parallel by nature and readily able to take advantage of additional transistors for performance, rather than relying on frequency. The Ivy Bridge GPU uses a substantially different mix of transistors as a result. According to Intel’s architects, half of the transistors are high Vt, about 45% are mid Vt and 5% are nominal. Higher threshold voltages have substantially less leakage, but lower frequency as well. The Ivy Bridge GPU is targeted for 0.65-1.1V operation and frequencies up to 1.2GHz. Compared to Sandy Bridge, Intel’s 22nm FinFET process delivers 8% lower peak frequency (1.2GHz vs. 1.3GHz), but with 12% lower voltage (0.65-1.25V on 32nm) for substantial power savings. The gains from 22nm are much larger at low voltage ranges, so Ivy Bridge should run much faster at 0.65-0.8V.
The power management is an area where Intel in particular seems to have a significant advantage. Since the media and graphics pipelines share a considerable amount of logic, there is a single power gating region for the whole GPU. The latency associated with power gating is roughly 10s of microseconds, which is substantially faster than many discrete GPUs. The dynamic frequency and voltage scaling (DVFS) for the Ivy Bridge GPU selects a new operating point roughly every 10 frames. This is fairly slow compared to a CPU, but the variation in the workload is also much smaller.
One problem that Intel found with the Sandy Bridge GPU and power management was image degradation. Specifically, certain software-based image quality techniques were only used in notebooks plugged into the wall. When a notebook was running on the battery, the power management would throttle to conserve energy. In Ivy Bridge, many image quality techniques were moved into fixed function hardware (e.g. contrast enhancement in the ROPs) that is extremely efficient. Due to the higher efficiency, the image quality techniques can be used in any mode (either AC or DC), with minimal impact to battery life.
Since the power and thermal budget for Ivy Bridge is shared between all components, there is no single number for power consumption. However, Intel’s architects estimated that the GPU is unlikely to draw more than 25W in desktop parts, while mobile products will not exceed 12W. This is roughly comparable to the previous generation, but the raw compute power has more than doubled. The fastest Ivy Bridge GPU achieves roughly 307GFLOP/s compared to 130GFLOP/s for Sandy Bridge. In practice the graphics performance will increase less, perhaps in the range of 1.5× to 2×.
Intel’s transformation from a graphics laggard to a viable competitor has been a long journey. The strategy has been one of consistent and measured improvement, rather than a dramatic revolution. With Sandy Bridge, Intel doubled the graphics performance of the previous generation Ironlake, and brought acceptable integrated graphics to the PC ecosystem in early 2010. Ivy Bridge tackles GPU programmability and will nearly double performance yet again.
Competitively, the Sandy Bridge GPU was eclipsed 6 months after release by AMD’s Llano GPU, which was both higher performance and programmable. The saving grace was Intel’s industry leading media encoding and decoding. Based on reviews and benchmarks, the gap in graphics performance varies from 1.3× to 2×. AMD’s next integrated offering in Trinity is reportedly 1.3× to 1.5× faster than the previous generation and should have competitive media processing.
Putting this all together, Intel will substantially narrow the gap with AMD for integrated graphics capabilities in 2012. Actual product level performance depends on pricing, binning and the market. For instance, Intel has an edge for very low power designs due to process technology. The 22nm FinFETs are exceptionally efficient at low voltage and it is likely that Ivy Bridge will match Trinity for 17W designs. At 25-35W for conventional notebooks, Intel should trail by around 20%, which is close enough to be competitive. Looking to desktops though, AMD will have a substantial advantage and the performance gap may be much higher.
Based on these estimates, the Ivy Bridge GPU will be the first truly competitive integrated graphics solution and a significant milestone for Intel. AMD has the benefit of over a decade of experience with high performance graphics and can leverage tremendous investments in discrete GPUs. While Intel might lack the same experience, it appears that a full process node advantage (22nm vs. 32nm) makes up for this deficiency. Ivy Bridge should exceed Llano in most workloads, and compared to Trinity, narrow the performance gap to reasonable levels for most markets.
Looking forward, there are some features that simply did not make it into Ivy Bridge and are obvious candidates for Haswell. In particular, the programmability is still somewhat nascent. Features such as DX11.1 and OpenCL 1.2 were omitted either due to project timing or risk and complexity reasons, and are expected in the next generation. System level integration was mostly unchanged in Ivy Bridge, but will probably be updated for Haswell. That would present an opportunity for more elegant sharing of the virtual address space and coherent communication between the CPU and GPU and greater efficiency. The performance should increase, particularly if the rumors are correct and there are 3 different variants of Haswell. That would give Intel the freedom to aggressively push performance for an expensive, high-end version, without compromising cost for the ‘free’ or mainstream flavors.
Discuss (32 comments)