Transistor count and transistor density are often portrayed as technical achievements and milestones. Many vendors brag about the complexity of their design, as measured by transistor count. In reality, transistor count and density varies considerably based on the type of chip and especially the type of circuitry within the chip, and there is no standard way of counting. The net result is that transistor count and density are only approximate metrics and focusing on those particular numbers risks losing sight of the bigger picture.
Power delivery is one of the most significant challenges in modern processors. The power delivery network (PDN) must meet the demanding requirements of modern CMOS technology, supply power with excellent efficiency, and swiftly respond to changes in power draw.
For me, SC19 was about the fusion of machine learning and scientific computing. I learned about new technologies from Nvidia, Graphcore, and Cerebras Systems and spoke on a panel about the role of MLPerf in benchmarking HPC systems for machine learning and the many lessons learned.
At VLSI 2018, researchers from TDK and TSMC described advances in Magneto-resistive memory (MRAM). TDK focused on new materials to improve writing for low-voltage MRAM cells at small geometries. A team from TSMC showcased circuit techniques to improve read performance of MRAM arrays despite process variability and a small read window.
IBM presented a neural network accelerator at VLSI 2018 showcasing a variety of architectural techniques for machine learning, including a regular 2D array of small processing elements optimized for dataflow computation, reduced precision arithmetic, and explicitly addressed memories.
Intel will offer 3DXP-based DIMMs (previously codenamed Apache Pass) that use the DDR4 interface on the next-generation Cascade Lake server processor. The first DIMMs will be available in 128GB, 256GB, and 512GB capacities and work with a new software architecture for persistent memory. Intel and its partners have enabled the new persistent memory programming model for Java, Linux, VMware, and Windows and many customers are eagerly awaiting the non-volatile, high-capacity memory for in-memory databases and other applications.
Intel’s 22FFL (FinFET Low-power) is a variant of their existing 22nm process that is aimed at low-cost, extremely low-power, and analog/RF applications. 22FFL relaxes the ground rules to reduce the need for double patterning, thereby cutting costs. At the same time, Intel’s engineers essentially backported the second and third generation FinFETs from the 10nm and 14nm processes to 22FFL, improving performance and power efficiency with superior fin geometry and workfunction metals. Intel also created a large library of digital and analog transistors and passive components.
Previously, Apple’s iPhones and iPads used PowerVR GPUs from Imagination Technologies for graphics. Based on our analysis, Apple has created a custom GPU that powers the A8, A9, and 10 processors, shipping in the iPhone 6 and later models, and some iPads. Using public documents, we demonstrate that the programmable shader cores inside Apple’s GPU are different from Imagination Technologies’ PowerVR and offer superior 16-bit floating-point performance and data conversion functions. We further believe that Apple has also developed a custom shader compiler and graphics driver. The proprietary design enables Apple to deliver best-in-class performance for graphics, and other tasks that use the GPU, such as image processing and machine learning.
On the eve of the 50th anniversary of Moore’s Law, the future of silicon CMOS is an open question. With rising costs and uncertain benefits, some semiconductor companies have questioned the wisdom of pursuing further scaling. I predict that Intel’s 10nm process technology will use Quantum Well FETs (QWFETs) with a 3D fin geometry, InGaAs for the NFET channel, and strained Germanium for the PFET channel, enabling lower voltage and more energy efficient transistors in 2016, and the rest of the industry will follow suit at the 7nm node.
My favorite paper from the ISSCC processor session describes an adaptive clocking technique implemented in AMD’s 28nm Steamroller core that compensates for power supply noise. Initial results show a 10-20% decrease in power consumption from reducing the voltage, with no loss in performance. This elegant technique is likely to be adopted across AMD’s entire product line including GPUs, x86 CPUs, ARM-based CPUs, and other critical blocks in highly integrated SoCs.