Pages: 1 2
Larrabee 1 Defers Graphics, Bins Rendering
Several years ago, Intel saw the coming convergence of latency optimized processors (i.e. CPUs) and throughput optimized processors (i.e. GPUs) and embarked down the path of developing their own high performance discrete GPU – codenamed Larrabee. Larrabee’s architecture has been disclosed previously at both Hot Chips and Intel’s Developer Forum; it uses a two-issue in-order x86 core, reminiscent of the P54C, with one pipeline dedicated for a 512-bit wide vector unit that executes the new Larrabee instructions. A group of 16 cores sits on a coherent ring bus, and multiple ring busses may be used for higher core counts (32 cores for the first implementation – Larrabee 1). Unlike GPUs from ATI and Nvidia, Larrabee (the architecture) is fully cache coherent, supports the basic x86 instruction set and has no dedicated hardware for rasterization, instead using a software layer that is part of the driver.
The first generation of Larrabee was to be manufactured on Intel’s 45nm high performance process and productized as a high-end discrete graphics card. We have recently learned that the graphics products based on Larrabee 1 have been cancelled. The first generation will instead be used as a software development vehicle for ISVs and the high performance computing (HPC) community. Intel has not laid off any staff and is continuing development for the Larrabee architecture and product families – although those plans will not be discussed till next year.
Reading between the lines, the rationale for canceling graphics cards based on Larrabee 1 is primarily performance, time to market and the competition. Intel will not enter a new market with an uncompetitive product. To be competitive in graphics, the performance for the combined hardware and software stack would need to be in-line with contemporary ATI and Nvidia discrete GPUs. The time to market plays a big role by determining what the contemporary GPUs from Nvidia and ATI will be – delays are quite an issue. Moore’s Law says you get about twice the area every 18 months due to a process shrink, and for GPUs, that translates directly into performance. Conceptually, every month of delays is equivalent to losing 3.9% performance.
A Soft and Complex Problem
Larrabee had quite a few challenges – I’ve said for a while that if they were able to hit within 20-30% of the performance of Nvidia or ATI, it would have been a tremendous accomplishment. It was Intel’s first high performance GPU in a long time, and Intel did not have the same degree of homegrown talent in graphics that Nvidia or ATI do. Finding that talent is not impossible, but it does not happen overnight and it not easy to get a new design team executing at 100%. The hardware is cache coherent and at a scale that Intel has not attempted before – 32 cores, when the largest product from Intel is the 8 core Nehalem EX. Last, the software stack is incredibly complex – and since Larrabee does not have dedicated hardware for rasterization, the drivers are even more important than for ATI or Nvidia. Unlike the existing players, Intel does not have a decade of experience developing high performance drivers for DirectX and OpenGL.
By all reports, the hardware for Larrabee 1 is in relatively good shape – especially for the first generation of a new architecture. The graphics drivers and software stack is likely to be the limiting factor that lead to the cancellation of Larrabee 1 graphics products. Larrabee 1 hardware had been delayed a bit, and hardware delays on a new architecture always translated into delays for software – in this case the delays for the software were larger than for the hardware. Factoring in the delays and performance for the graphics drivers, Larrabee 1 was just not competitive with the graphics offerings from ATI and Nvidia. This would have meant low prices on a huge chip, an undesirable combination for sure.
Another factor that suggests difficulties on the software side is the fact that Intel will be releasing Larrabee 1 as a development vehicle for the HPC community, which does not need a graphics driver or software rasterizer. HPC applications obviously need compilers, performance tuning utilities and so forth, but those are much simpler to develop and can leverage Intel’s existing software resources (e.g. ICC and VTune).