For me, SC19 was about the fusion of machine learning and scientific computing. I learned about new technologies from Nvidia, Graphcore, and Cerebras Systems and spoke on a panel about the role of MLPerf in benchmarking HPC systems for machine learning and the many lessons learned.
At VLSI 2018, researchers from TDK and TSMC described advances in Magneto-resistive memory (MRAM). TDK focused on new materials to improve writing for low-voltage MRAM cells at small geometries. A team from TSMC showcased circuit techniques to improve read performance of MRAM arrays despite process variability and a small read window.
IBM presented a neural network accelerator at VLSI 2018 showcasing a variety of architectural techniques for machine learning, including a regular 2D array of small processing elements optimized for dataflow computation, reduced precision arithmetic, and explicitly addressed memories.
Intel will offer 3DXP-based DIMMs (previously codenamed Apache Pass) that use the DDR4 interface on the next-generation Cascade Lake server processor. The first DIMMs will be available in 128GB, 256GB, and 512GB capacities and work with a new software architecture for persistent memory. Intel and its partners have enabled the new persistent memory programming model for Java, Linux, VMware, and Windows and many customers are eagerly awaiting the non-volatile, high-capacity memory for in-memory databases and other applications.
Intel’s 22FFL (FinFET Low-power) is a variant of their existing 22nm process that is aimed at low-cost, extremely low-power, and analog/RF applications. 22FFL relaxes the ground rules to reduce the need for double patterning, thereby cutting costs. At the same time, Intel’s engineers essentially backported the second and third generation FinFETs from the 10nm and 14nm processes to 22FFL, improving performance and power efficiency with superior fin geometry and workfunction metals. Intel also created a large library of digital and analog transistors and passive components.
Previously, Apple’s iPhones and iPads used PowerVR GPUs from Imagination Technologies for graphics. Based on our analysis, Apple has created a custom GPU that powers the A8, A9, and 10 processors, shipping in the iPhone 6 and later models, and some iPads. Using public documents, we demonstrate that the programmable shader cores inside Apple’s GPU are different from Imagination Technologies’ PowerVR and offer superior 16-bit floating-point performance and data conversion functions. We further believe that Apple has also developed a custom shader compiler and graphics driver. The proprietary design enables Apple to deliver best-in-class performance for graphics, and other tasks that use the GPU, such as image processing and machine learning.
On the eve of the 50th anniversary of Moore’s Law, the future of silicon CMOS is an open question. With rising costs and uncertain benefits, some semiconductor companies have questioned the wisdom of pursuing further scaling. I predict that Intel’s 10nm process technology will use Quantum Well FETs (QWFETs) with a 3D fin geometry, InGaAs for the NFET channel, and strained Germanium for the PFET channel, enabling lower voltage and more energy efficient transistors in 2016, and the rest of the industry will follow suit at the 7nm node.
My favorite paper from the ISSCC processor session describes an adaptive clocking technique implemented in AMD’s 28nm Steamroller core that compensates for power supply noise. Initial results show a 10-20% decrease in power consumption from reducing the voltage, with no loss in performance. This elegant technique is likely to be adopted across AMD’s entire product line including GPUs, x86 CPUs, ARM-based CPUs, and other critical blocks in highly integrated SoCs.
The 14nm Knights Landing leverages Intel’s resources with a laser-like focus on HPC to deliver a massive improvement over the previous generation. The building block of this architecture is a pair of Silvermont-inspired CPUs with wide vector units and most importantly, a brand new cache hierarchy, on-die fabric, and system infrastructure that is shared with Skylake. This article is an in-depth analysis and prediction of the Knights Landing architecture.
Knights Landing is Intel’s first clean sheet redesign of the Larrabee family, targeted at throughput computing and manufactured on a 14nm process with products expected in late 2014 or early 2015. The adoption of AVX3, on-package embedded DRAM, and bootable products have been disclosed, but most details are unknown. This article analyzes the options available for the Knights Landing CPU core and explains why Intel’s existing cores are a poor fit for the target workloads, concluding that the most likely outcome is a new custom core for Knights Landing.