In the last several years, the landscape for computing has become increasingly interesting and diverse. Perhaps the first sign was the alternative architecture of the Cell processor, which was a radically departure from the efforts of CPUs from AMD, Fujitsu, IBM, Intel and Sun – throwing out several decades of programmability advances in favor of higher performance and efficiency. Unfortunately, radical shifts seem to be relatively unsuccessful compared to the gradual pressures of evolution and refinement – Cell proved this point, as it is largely unused outside of the PlayStation 3 and does not appear to have much of a future in mainstream use.
GPUs have gradually evolved to be less application specific and slightly more generalized than their fixed function ancestors. The changes started in the DirectX 9 time frame, with real floating point (FP) data types, but still fixed vertex, geometry and pixel processing. DX10 hardware was really the turning point with unified shaders, relatively complete data types (i.e. integers were added) and slightly more flexible control flow. Today the high-end is a four horse race between AMD nee ATI, Intel’s and AMD’s integrated graphics and Larrabee, and Nvidia. All four face different goals, constraints and hence have taken slightly different paths.
AMD’s efforts have focused primarily on graphics excellence and improving their market share and presence in that arena – they are content to let others bear the standard for general purpose computation, until a market truly exists. AMD is cost conscious with the ‘sweet spot’ strategy that optimizes for the heart of the market and eschewing the burdens of higher programmability, but not cost constrained.
On the other hand, Intel’s and AMD’s integrated graphics (IGPs) are severely cost constrained; allocated only a slender sliver of silicon in a northbridge, and soon in the same package or die as the CPU. They rarely have a large power budget or high speed memory. Despite the resulting low performance, they are the titans of the four, with about 50-60% of the market, and accelerating growth. Nvidia is sort of the odd man out here, in that IGPs are being highly integrated with CPUs, which they lack.
Larrabee is a different kettle of fish entirely. It a new initiative from Intel, promising substantial advances in programmability that will put GPUs on par with CPUs, offering programmers limitless potential. This vision is compelling, but no products will arrive till next year.
Coming last alphabetically but at the forefront of programmability is Nvidia. Historically, they have pushed GPU programmability forward, rather than AMD. They hope to tread an enlightened middle path, striving towards complete programmability, without surrendering their graphical heritage to a combination of AMD’s focused discrete products and IGPs. Nvidia’s programmable products are focused on the high performance computing (HPC) market, where the margins are quite high compared to consumer graphics cards.
Nvidia’s last generation product, the GT200 struck a fine balance. The architecture certainly pushed the envelope of programmability, adding double precision support, atomic operations and a fledging software ecosystem, while holding the highest performance for a single GPU product. At the moment, that crown actually belongs to AMD’s Radeon 5870, which launched last week. AMD’s focused optimization is showing gains in many segments of the graphics market, particularly those below the ‘extreme’ price points, in the mainstream market.
It is in this context that Nvidia has announced a next generation architecture, which aims for even greater performance, reliability and programmability; unlocking even more software capabilities. This new architecture goes by several names to the keep the unwary on their toes: Fermi or GF100, although some in the press are mistakenly bandying about GT300. Nvidia has chosen to primarily discuss architecture and not to disclose most microarchitecture or implementation details in this announcement. Where possible, our educated speculation fills these gaps and will be clearly noted as such. The lack of details is partially due to the fact that products based on Fermi will not be out for several months – and even this timeline is unclear.
Curiously, they are also not discussing the graphical capabilities of this chip and instead focusing only on compute. Hence our discussion is focused primarily on the GPU as a compute device. Accordingly, we will try and use standard terminology and point out where and how GPU terminology differs.