Insights into Efficiency
The data in the prior charts suggests that while GPUs can be more efficient than CPUs, this is not always the case. It’s clear that the efficiency of GPUs varies substantially based on architectural and implementation decisions. The difference between Nvidia and AMD GPUs demonstrate the impact of architecture, and comparing AMD’s RV770 to the RV670 shows the effects of implementation details such as GDDR5, even on the same process node. The gap between Nvidia and AMD GPUs highlights the possibilities for Nvidia to improve with their next generation products, particularly as they move to GDDR5 and tweak the shaders for better double precision performance (best case is a 2:1 ratio of SP:DP performance, like x86 CPUs, but even 4:1 would be a big step forward). The gap between the two families should narrow, but will still persist and the overall spread between GPUs may ultimately widen as Larrabee will be a third horse in the race.
Based on the data, GPUs tend to be at least as efficient (in both area and power) as high performance CPUs, although in most cases they are substantially more efficient. Over time, GPUs will improve their DP capabilities, boosting our measure of performance and hence efficiency. Conversely, CPUs will add more cores and wider vector units (e.g. single cycle throughput AVX) that boost performance and efficiency further. Both CPUs and GPUs benefit from new process nodes and memory interfaces, although GPUs will see those changes sooner (GDDR5 and 40nm).
Simpler and less aggressive CPUs, such as Atom, have demonstrated the potential to be competitive with almost any current GPU from a power efficiency stand point. There are still process node differences which suggest that GPUs may be moderately more efficient, but by the same token, Atom currently does not use packed SSE operations, a rather obvious piece of low hanging fruit. These results are quite counter-intuitive and encouraging for CPU vendors. This implies that CPU architects can freely tune different microarchitectures to achieve various levels of performance/watt that cover a similar range to modern GPUs. Put another way, x86 compatibility is not a particularly challenging barrier to power efficiency – something that Intel hopes will be true for Larrabee.
Area efficiency seems to be more difficult for x86 CPUs – even a very compact core like Atom. This is not surprising, since CPUs need more control logic than GPUs and x86 compatibility consumes extra area for most implementations. For instance, microcode and x87 do not burn much power when idle, but they do use area. This can be minimized by sharing legacy hardware between cores, or pushing it into microcode or software altogether (e.g. Transmeta), but most x86 vendors have not taken those steps yet. Similarly, wider vector units better amortize control logic and legacy hardware over greater computational capabilities – watch as AVX and other vector extensions propagate across all x86 CPUs. Future variants of Atom (or other low power microarchitectures) will be very interesting to watch in this respect – as the vector execution capabilities are currently very weak. Another factor is the use of cache and prefetchers, which are particularly common (and power efficient) in CPUs for improving programmability by supporting complex data structures and reducing memory traffic and stalls. One factor which could change the outlook for CPUs is availability of an SOC process that sacrifices raw frequency headroom for better density or lower power. AMD will assuredly have a similar option from Global Foundries should the need arise, and TSMC already provides similar options to GPU designers. Nonetheless, CPUs lag behind in area efficiency and they will have to improve to keep pace with GPUs, should that be a design goal.
The bottom line of this rough analysis is that the gap between CPUs and GPUs isn’t quite as big as some have claimed, considering the power and die area. When taking these factors into account, GPUs seem to have a clear performance/mm2 advantage. However, performance/watt is more important and in that particular metric, CPUs can come much closer (at the cost of single threaded performance). GPUs are still very effective for certain workloads and clearly hold many advantages in terms of raw performance and bandwidth, but these advantages are not necessarily unassailable.
Over the course of the next few days, AMD will reveal their next generation architecture and Nvidia is likely to do so at the end of the month or later in the fall. As both are moving from 55nm to 40nm, they will substantially improve their efficiency, and Nvidia is likely to get a further boost from GDDR5 and more robust double precision support. This will undoubtedly change the situation and make GPUs more attractive than the current analysis suggests – prompting a second look at the data. Perhaps we will even have the luxury of using more accurate metrics than theoretical peak FLOP/s, TDP and die area. However, the bigger questions concern the long term trends for GPUs and CPUs, rather than a snapshot at a single point in time. And the evolution of GPUs and CPUs is a fascinating avenue of discussion, one with rather interesting historical precedents.