A Look Inside Apple’s Custom GPU for the iPhone

Pages: 1 2

From the first models, Apple’s iPhones and iPads have licensed and used PowerVR GPUs from Imagination Technologies for graphics. Apple even owns around 10% of Imagination and is Imagination’s biggest customer, accounting for about 30% of the company’s revenue. Just as Apple began by licensing standard ARM CPU cores but now designs its own, we believe the company has similarly shifted from licensing PowerVR to designing a custom GPU. This new GPU first shipped in the A8 processor that is in the iPhone 6, and its descendants are also in the A9 and A10 Fusion processors in the iPhone 6S and 7.

A modern GPU, like the ones inside Apple’s iPhone and iPad, has three major components that must all work together in harmony to render a scene. The first is the fixed-function graphics hardware, which is responsible for tasks like processing API commands, triangle rasterization, and raster output. The second is the shader core, which is the heart of the GPU and executes programmable shaders (e.g., vertex, geometry, pixel, and compute shaders). Last, the graphics driver is the software that runs on the CPU and ties everything together, coordinating the activities of the GPU. The driver transforms graphics applications written in the Metal or OpenGL ES APIs into a series of commands for the fixed-function hardware and programmable shaders that execute on the shader cores. One of the driver’s largest components is the compiler which generates machine code to run on the shader cores.

In older generations, the fixed-function hardware, shader cores, and driver were all licensed by Apple from Imagination Technologies. However, over the last 6-7 years Apple has been aggressively hiring graphics architects and driver and compiler engineers from places like AMD, Intel, Google, and Nvidia to design a custom GPU. For example, Mike Wuerthele at Apple Insider wrote that about 25 people from Imagination Technologies were hired away by Apple earlier this year. Apple’s GPU appears to still use some of the PowerVR fixed-function graphics hardware. However based on a variety of public evidence, it is clear that Apple has replaced the programmable shader cores with their own more efficient and higher performance design. To take advantage of the custom shader cores, Apple also developed their own driver and compiler to emit code for its architecture. The overall result is that while Apple’s GPU shares some heritage with PowerVR, it is a unique and proprietary design. It is a world-class design with impressive performance and power efficiency; the A9 processor has the best score on nearly every mobile graphics benchmark, and the A10 Fusion is 40-50% faster still.

The architecture of Apple’s GPU has never been publicly documented. In order for developers to take advantage of the GPU, they have to understand how to write shader programs for the Metal and OpenGL compilers. At WWDC 2016, Apple engineers gave a presentation “Advanced Metal Shader Optimization” that contains the most detailed tuning guidelines and architectural details to date on the custom GPU. The architecture for the PowerVR Series 6 GPU is also poorly documented, but Imagination Technologies has shared some basic compiler and optimization manuals. Comparing the available details for the two make it clear that they are very distinct. In particular, Apple’s register file and data conversion functions are better suited for performance and power-efficiency and are an easier compiler target.

Apple Boosts Performance, Power Efficiency with Smaller Registers

The OpenGL ES mobile graphics API and Apple’s proprietary Metal API support the 16-bit half-precision floating-point format for image data and calculations, which consume less energy than 32-bit single-precision calculations. Half-precision computations can lose accuracy faster than single-precision in some circumstances. However for many graphics, image processing, and machine learning workloads, half-precision is accurate enough to give correct results – especially since most displays only have 8 to 12 bits of dynamic range per pixel.

The register file for Apple’s GPU is composed of 16-bit registers, an ideal fit for half-precision data, based on public presentations from the company (see [1]). Single-precision floating-point values and other 32-bit data consume two registers. As a result, the register file can store twice as many 16-bit variables as 32-bit variables. Apple’s engineers emphasized that using half-precision offers much better performance and power efficiency than single-precision, making it clear that their architecture is focused on half-precision as a primary design point.

In contrast, the PowerVR Series 6 and 7 GPUs use 32-bit registers and are designed for single-precision calculations, based on tuning guidelines from Imagination Technologies (see [2]). In Series 6, the most common instructions such as FMAD, FMUL, and FADD can operate on half-precision data, but simply flush many of the bits in the source and destination register to zero. Some instructions can operate on two 16-bit SIMD elements within a single register (and Series 7 extends this capability to more instructions), but SIMD execution is quite distinct from scalar execution using 16-bit registers. For PowerVR, storing data in a 16-bit format wastes some of the register space and does not automatically double the number of variables that can be held in the register file. So using 16-bit data should reduce the memory bandwidth and energy consumed, but will not intrinsically increase performance or power efficiency as much as it does for Apple’s GPU.

Free Conversion Unlocks Half-Precision for Programmers

One common challenge with 16-bit data is that while most calculations are fine with reduced precision, some parts need greater precision. For example, a shader that calculates the color of a large block of pixels and then calculates the average can probably use 16-bits for each individual pixel, but may need to use 32-bits when summing the values together to count correctly. If converting the pixel data from 16-bit to 32-bit is too expensive, the whole shader will use 32-bit math to generate the correct answer.

Apple’s GPU offers superior conversion between different data types to encourage mixing precision, thereby creating more opportunities for higher-performance and lower-power 16-bit calculations. According to Apple’s presentation, data type conversion is free; which suggests that the conversion hardware sits in the standard data path. While this appoach is more expensive in terms of hardware, it also dramatically simplifies the compiler and makes it much easier for programmers to write good software.

The PowerVR Series 6 and 7 can convert between different precision data types, but it is definitely not free. The optimization manual specifically states that each data conversion (e.g., from higher to lower precision or lower to higher precision) has a cost, and recommends that developers write shader programs that minimize the number of conversions (see [3]).

Technical Differences Reveal Apple’s Custom GPU

The contrast between the register file and data conversion in the Apple’s and Imagination GPUs is tremendous. The register file organization is fundamental to the shader core and impacts the design of nearly everything, from the instruction set architecture of the shader cores to the execution units and scheduling logic. As one example, the register size determines the data path and wiring that runs throughout most of the shader core. The data conversion is not quite as substantial, but makes a very big difference to the compiler and to developers. The PowerVR Series 7 GPU is fairly similar to the previous generation Series 6, and crucially also uses 32-bit registers. Based on these differences, the only logical conclusion is that Apple’s GPU uses a proprietary shader core that was internally designed. By extension, this means that Apple also developed their own shader compiler for the OpenGL ES and Metal APIs and most likely, their own graphics driver.

Even some benchmarks have caught on to the differences. An older GFXBench submission describes the GPU in the iPhone 7 as the G9.

Apple G9 GFXBench Submission

Figure 1. Apple G9 GFXBench Submission

The original submission in the public GFXBench database has been altered, and any mention of the G9 has been scrubbed. However, the submission has been preserved by Back to the Mac, a Korean-language website.

There are many other differences between the Apple GPU and PowerVR that could probably be detected by running directed tests through Metal shaders and comparing the results to similar OpenGL ES shaders on PowerVR GPUs. It is also likely that Metal has unique features that are not possible on PowerVR GPUs. However, some of the differences may not reflect unique hardware choices. For example, the Apple GPU supports up to OpenGL ES 3.0, whereas the PowerVR GPUs can run later versions. However, this difference could easily be due to software and drivers (and encouraging developers to use Metal), rather than hardware.

Pages:   1 2  Next »

Discuss (20 comments)