Over the last several years, AMD has encountered a number of challenges. On the manufacturing and CPU side, the competition with Intel has been brutal. Intel delivered a decisive combination of a 45nm manufacturing process with a novel high-k and metal gate stack in 2007 that dramatically boosted transistor performance and efficiency, and the new Nehalem microarchitecture in 2008. At the same time, AMD was struggling to consistently deliver competitive products and was grappling with the spin-out of their manufacturing to Global Foundries (which will eventually have a high-k, metal gate stack ready at 32nm in 2011).
At the same time, the graphics business at AMD has been a bright spot for the company. The graphics division has been consistently improving their product line in every generation and took major steps forward with the 55nm RV770. The results have been quite spectacular and AMD has leapt from roughly a third of the discrete graphics card market in 2008 to a hair over 50% in the middle of 2010. Certainly the graphics group has had a few mis-steps, and their performance in the workstation market is still quite poor; but the overall trend is a very positive one.
There are several factors at play here. The first is that AMD has very slowly and cautiously embraced general purpose computing on GPUs, and focused their resources first and foremost on graphics. In contrast, Nvidia has blazed a trail in GPU computing with Fermi, where they have carved out a leadership position in a new and lucrative market, but that has come with a cost in their core graphics business. The second factor is that as GPUs have moved from the 65nm node down to 40nm, the physical design has become increasingly challenging. At each new generation, foundries and device manufacturers are faced with continuing Moore’s Law and improving performance and power efficiency. Unfortunately, some of these challenges ultimately manifest as problems in circuit design, power consumption and yield. In terms of physical design prowess, AMD is simply ahead of Nvidia. Part of this is cultural – Nvidia has always had far better software, while ArtX (which was acquired by ATI) was a top notch chip design company. Moreover, AMD has considerable physical design expertise from the CPU and manufacturing division that was beneficial to the graphics group. As a result, AMD simply had a smoother ride to advanced CMOS technology than Nvidia.
Of course, there were considerable bumps along the way. One of the challenges for AMD (and specifically Cayman and Barts) came from their manufacturing partner TSMC. Historically, TSMC had both a mainline logic node (e.g. 65nm) and then a half-node shrink (e.g. 55nm), which was good for a mild boost in density and transistor performance. In late November of 2009, TSMC announced that it had cancelled the 32nm node, due to relatively small volumes on the node. Instead, they would focus all their efforts on the 28nm node. For many customers, this was not an issue – however, AMD was quite far along on a next generation GPU that did target 32nm. Rather than cancel the project, it was retargeted for the proven and mature 40nm and moved up to the end of 2010 (and renamed Cayman). To accomplish this rescheduling though, AMD had to extensively rely on design re-use from previous generations.
After a highly compressed development cycle, AMD is launching the 40nm Cayman based GPUs. Cayman is the biggest microarchitecture change since the RV770. The headline is a move from a 5-wide VLIW to a 4-wide symmetric VLIW, but there are also a number of other incremental enhancements intended for both graphics and more general workloads. Cayman represents a step in the direction towards GPU computing, but a modest and evolutionary improvement in programmability, rather than a wholesale revolution. In this report, we will explore the Cayman architecture, with an eye towards compute oriented workloads.