For me, SC19 was about the fusion of machine learning and scientific computing. I learned about new technologies from Nvidia, Graphcore, and Cerebras Systems and spoke on a panel about the role of MLPerf in benchmarking HPC systems for machine learning and the many lessons learned.
Previously, Apple’s iPhones and iPads used PowerVR GPUs from Imagination Technologies for graphics. Based on our analysis, Apple has created a custom GPU that powers the A8, A9, and 10 processors, shipping in the iPhone 6 and later models, and some iPads. Using public documents, we demonstrate that the programmable shader cores inside Apple’s GPU are different from Imagination Technologies’ PowerVR and offer superior 16-bit floating-point performance and data conversion functions. We further believe that Apple has also developed a custom shader compiler and graphics driver. The proprietary design enables Apple to deliver best-in-class performance for graphics, and other tasks that use the GPU, such as image processing and machine learning.
Starting with the Maxwell GM20x architecture, Nvidia high-performance GPUs have borrowed techniques from low-power mobile graphics architectures. Specifically, Maxwell and Pascal use tile-based immediate-mode rasterizers that buffer pixel output, instead of conventional full-screen immediate-mode rasterizers. Using simple DirectX shaders, we demonstrate the tile-based rasterization in Nvidia’s Maxwell and Pascal GPUs and contrast this behavior to the immediate-mode rasterizer used by AMD.
My favorite paper from the ISSCC processor session describes an adaptive clocking technique implemented in AMD’s 28nm Steamroller core that compensates for power supply noise. Initial results show a 10-20% decrease in power consumption from reducing the voltage, with no loss in performance. This elegant technique is likely to be adopted across AMD’s entire product line including GPUs, x86 CPUs, ARM-based CPUs, and other critical blocks in highly integrated SoCs.
Graphics is a focal point of the upcoming Haswell platform, necessitating a high bandwidth memory solution. To deliver high performance Intel is returning to the DRAM market, which it exited in 1985. The memory that ships with Haswell will be a custom embedded DRAM mounted in the package and manufactured on a variant of Intel’s 22nm process. By avoiding the commodity memory market, Intel will preserve high margins by cannibalizing discrete GPUs and dedicated graphics memory.
The iPad 3 was an influential and successful tablet, but an excellent example of an unbalanced system. In particular, the superb Retina display was not adequately matched by the GPU of the A5X, and represented a step backwards in terms of graphics capabilities. This article explores the challenges of designing innovative products given the underlying technical constraints, through the lens of the iPad 3 and its successors.
Near-threshold voltage computing extends the voltage scaling associated with Moore’s Law and dramatically improves power and energy efficiency. The technology is superb for throughput, at the cost of latency, and best suited to Intel’s products for HPC and mobile graphics.
New compute efficiency data shows GPUs with a clear edge over CPUs, but the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a throughput CPU is the most energy efficient processor, offering hope for future architectures. Our data also shows some advantages of AMD’s Bulldozer, and the overhead associated with highly scalable server CPUs.
The Ivy Bridge GPU takes advantage of Intel’s 22nm FinFET process to nearly double performance and enhance programmability with DX11 and OpenCL 1.1 support. The new scalable architecture features more powerful shader cores, distributed sampling pipelines, a high bandwidth L3 cache, tesselation and 4K resolution displays. Overall, Ivy Bridge should be the highest performance integrated GPU at launch and Intel’s first competitive graphics offering.
Our first look at Kepler focuses on architectural changes to the shader core that emphasize graphics performance and the enhanced power management. Based on our analysis of Nvidia’s 28nm GPU strategy, we project a new shader core for throughput computing products and discuss the expected features.