Introduction to OpenCL
Using a GPU for computational workloads is not a new concept. The first work in this area dates back to academic research in 2003, but it took the advent of unified shaders in the DX10 generation for GPU computing to be a plausible future. Around that time, Nvidia and ATI began releasing proprietary compute APIs for their graphics processors, and a number of companies were working on tools to leverage GPUs and other alternative architectures. The landscape back then was incredibly fragmented and almost every option required a proprietary solution – either software, hardware or both. Some of the engineers at Apple looked at the situation and decided that GPU computing had potential – but they wanted a standard API that would let them write code and run on many different hardware platforms. It was clear that Microsoft would eventually create one for Windows (ultimately DirectCompute), but what about Linux, and OS X? Thus an internal project was born, that would eventually become OpenCL.
The goals for OpenCL are deceptively simple: a cross-platform API and ecosystem for applications to take advantage of heterogeneous computing resources for parallel applications. The name also makes it clear – that OpenCL is the compute analogue of OpenGL and is intended to fill a similar role. While GPUs were explicitly targeted, a number of other devices have considerable potential, but lack a suitable programming model, including IBM’s Cell processor and various FPGAs. Multi-core CPUs are also candidates for OpenCL, especially given the difficultly inherent in parallel programming models, with the added benefit of integration with other devices.
OpenCL has a broad and inclusive approach to parallelism, both in software and hardware. The initial incarnations focus on data parallel programming models, partially because of the existing work in the area. However, task level parallelism is certainly anticipated and on the road map. In fact, one of the most interesting areas will be the interplay between the two.
The cross-platform aspect ensures that applications will be portable between different hardware platforms, from a functionality and correctness stand point. Performance will naturally vary across platforms and vendors, and improve over time as hardware evolves to exploit ever more parallelism. This means that OpenCL embraces multiple cores and vectorization as equally valid approaches and enables software to readily exploit both.
OpenCL is a C-like language, but with a number of restrictions to improve parallel execution (e.g. no recursion and limited pointers). For most implementations, the compiler back-end is based on LLVM, an open-source project out of UIUC. LLVM was a natural choice, as it is extensively used within Apple. It has a more permissive license than the GNU suite and many of the key contributors are employed with Apple.
The first widely supported, programmable GPUs were the DX10 generation from Nvidia, accompanied by a proprietary API, CUDA, and a fledging software ecosystem. To take advantage of this, Apple worked closely with Nvidia on their early efforts. The result is that OpenCL was heavily influenced by CUDA. In essence, CUDA served as a starting point and Apple then incorporated their own vision and a great deal of input from AMD, Imagination Technologies (which is responsible for nearly all cell phone graphics solutions) and Intel. Once the project was in good enough shape, Apple put OpenCL into the hands of the Khronos Group, the standards body behind OpenGL.
The lion’s share of the early OpenCL work was done by Apple and Nvidia. The first software implementation of OpenCL was a key feature in the v10.6 of the Mac OS, which was released in August of 2009. In order to promote the burgeoning standard, Apple mandated hardware support on all their PC systems, from the humble Mac Mini to the Mac Pro. Since Nvidia was the only compatible hardware solution early on, this gave them a virtual monopoly on Apple’s chipsets and graphics cards for the first several years. The rest of the industry signed onto OpenCL in fairly short order, however, actual hardware and software has only just begun to catch up and take shape.
The progress in the PC ecosystem has just started. Nvidia supports OpenCL across their full product line, as they have from inception. AMD took a slightly indirect route, first releasing OpenCL for CPUs (and GPUs using OS X) in August of 2009 and adding GPU support for Windows and Linux in December 2009. S3’s embedded graphics added OpenCL 1.0 in later 2009, as did VIA for the video processors in their chipsets. IBM has also a version of OpenCL for PowerPC and Cell processors. Of all the major players, Intel is taking the longest to release OpenCL compatible products. Their first CPU implementation will arrive in early 2011 with Sandy Bridge. Unfortunately, the Sandy Bridge GPU lacks certain required functionality, so the first GPU implementation of OpenCL will be on Ivy Bridge, the following year. Of all the different vendors, Nvidia’s support is by far the most full featured and robust, since it leverages their existing investment in CUDA. On the software side, things are moving slightly slower with only a handful of early adopters – partially because the hardware support has just started to move beyond Nvidia.
Just as OpenGL is used in both the PC and embedded worlds, OpenCL also has generated substantial interest within the mobile and embedded ecosystem. Imagination Technologies, which is responsible for the vast majority of cell phone GPUs, announced OpenCL 1.0 support for the SGX545 graphics core. Samsung has a compatible solution, based on an ARM Cortex A9 microprocessor for cell phones. Perhaps more importantly, Khronos, has released an ‘Embedded Profile’ for OpenCL that relaxes some of the requirements to improve power efficiency and cost. Outside of the mobile world, it is conceivable (albeit unlikely) that FPGA vendors may use OpenCL as a programmer friendly interface (compared to Verilog) for their hardware, at the cost of some efficiency.