For me, SC19 was about the fusion of machine learning and scientific computing. I learned about new technologies from Nvidia, Graphcore, and Cerebras Systems and spoke on a panel about the role of MLPerf in benchmarking HPC systems for machine learning and the many lessons learned.
Intel will offer 3DXP-based DIMMs (previously codenamed Apache Pass) that use the DDR4 interface on the next-generation Cascade Lake server processor. The first DIMMs will be available in 128GB, 256GB, and 512GB capacities and work with a new software architecture for persistent memory. Intel and its partners have enabled the new persistent memory programming model for Java, Linux, VMware, and Windows and many customers are eagerly awaiting the non-volatile, high-capacity memory for in-memory databases and other applications.
My favorite paper from the ISSCC processor session describes an adaptive clocking technique implemented in AMD’s 28nm Steamroller core that compensates for power supply noise. Initial results show a 10-20% decrease in power consumption from reducing the voltage, with no loss in performance. This elegant technique is likely to be adopted across AMD’s entire product line including GPUs, x86 CPUs, ARM-based CPUs, and other critical blocks in highly integrated SoCs.
Jaguar is AMD’s first 28nm processor, a compact 3.1mm2 design that targets 2-25W devices. It is a derivative of the earlier 40nm Bobcat, a fully out-of-order two issue design, with significant improvements in instruction set architecture and implementation. Some of the highlights include support for AVX, wider 128-bit datapaths, and a higher performance L2 cache. Jaguar is already shipping in several AMD SoCs targeted at tablets, notebooks, microservers, and desktops. However, it is far more prominent as the CPU powering the Sony Playstation 4 and Microsoft Xbox One.
The 14nm Knights Landing leverages Intel’s resources with a laser-like focus on HPC to deliver a massive improvement over the previous generation. The building block of this architecture is a pair of Silvermont-inspired CPUs with wide vector units and most importantly, a brand new cache hierarchy, on-die fabric, and system infrastructure that is shared with Skylake. This article is an in-depth analysis and prediction of the Knights Landing architecture.
Knights Landing is Intel’s first clean sheet redesign of the Larrabee family, targeted at throughput computing and manufactured on a 14nm process with products expected in late 2014 or early 2015. The adoption of AVX3, on-package embedded DRAM, and bootable products have been disclosed, but most details are unknown. This article analyzes the options available for the Knights Landing CPU core and explains why Intel’s existing cores are a poor fit for the target workloads, concluding that the most likely outcome is a new custom core for Knights Landing.
Silvermont is Intel’s first CPU core tailored for power efficient applications such as smartphones, tablets, and microservers. The 22nm microarchitecture features updated instruction set extensions, full out-of-order execution with a tightly coupled L2 cache, aggressive power management, and a new high performance SoC fabric. These enhancements deliver tremendous performance and frequency gains over the aging Atom core, putting Intel’s mobile strategy in a more competitive position.
The server market is at a potential inflection point, with a new breed of ARM-based microserver vendors challenging the status quo, particularly for cloud computing. We survey 20 modern processors to understand the options for alternative architectures. To achieve disruptive performance, microserver vendors must deeply specialize in particular workloads. However, there is a trade-off between differentiation and market breadth. As the handful of microserver startups are culled to 1-2 viable vendors, only the companies which deliver compelling advantages to significant markets will survive.
Intel’s Haswell CPU is the first core optimized for 22nm and includes a huge number of innovations for developers and users. New instructions for transactional memory, bit-manipulation, full 256-bit integer SIMD and floating point multiply-accumulate are combined in a microarchitecture that essentially doubles computational throughput and cache bandwidth. Most importantly, the microarchitecture was designed for efficiency and extends Intel’s offerings down to 10W tablets, while maintaining leadership for notebooks, desktops, servers and workstations.
Near-threshold voltage computing extends the voltage scaling associated with Moore’s Law and dramatically improves power and energy efficiency. The technology is superb for throughput, at the cost of latency, and best suited to Intel’s products for HPC and mobile graphics.