Starting with the Maxwell GM20x architecture, Nvidia high-performance GPUs have borrowed techniques from low-power mobile graphics architectures. Specifically, Maxwell and Pascal use tile-based immediate-mode rasterizers that buffer pixel output, instead of conventional full-screen immediate-mode rasterizers. Using simple DirectX shaders, we demonstrate the tile-based rasterization in Nvidia’s Maxwell and Pascal GPUs and contrast this behavior to the immediate-mode rasterizer used by AMD.
The new ARMv8 architecture is classically British; a clean and elegant 64-bit instruction set, with compatibility for 32-bit software. The 64-bit mode eliminates many complicated and awkward features and will foster a larger and more diverse ARM ecosystem with new licensees and applications.
New compute efficiency data shows GPUs with a clear edge over CPUs, but the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a throughput CPU is the most energy efficient processor, offering hope for future architectures. Our data also shows some advantages of AMD’s Bulldozer, and the overhead associated with highly scalable server CPUs.
Our first look at Kepler focuses on architectural changes to the shader core that emphasize graphics performance and the enhanced power management. Based on our analysis of Nvidia’s 28nm GPU strategy, we project a new shader core for throughput computing products and discuss the expected features.
Nvidia’s Kal-El sports a novel 5th ‘companion’ core to lower idle power. We look at the trade-offs and benefits to this approach and explain why it will be a strong tablet SoC, but only an incremental gain for smartphones.
Modern graphics processors are incredibly complex, but understanding their performance is essential, as they become an increasingly important component of computer systems. In this report, we use a set of benchmark results to build accurate performance models for AMD and Nvidia GPUs. We verify that our model can predict performance within roughly 6-8% for many desktop graphics cards and show how Nvidia’s microarchitecture and drivers achieve roughly 2X higher utilization than AMD’s VLIW5 design.
PhysX is a key application that Nvidia uses to showcase the advantages of GPU computing (GPGPU) for consumers. PhysX executing on an Nvidia GPU an improve performance by 2-4X compared to running on a CPU from Intel or AMD. We investigated and discovered that CPU PhysX exclusively uses x87 rather than the faster SSE instructions. This hobbles the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
In the last several years, the landscape for computing has become increasingly interesting and diverse. GPUs have gradually evolved to be less application specific and slightly more generalized than their fixed function ancestors. The changes started in the DirectX 9 time frame, with real floating point (FP) data types, but still fixed vertex, geometry and pixel processing. DX10 hardware was really the turning point with unified shaders, relatively complete data types (i.e. integers were added) and slightly more flexible control flow. Today the high-end is a four horse race between AMD nee ATI, Intel’s and AMD’s integrated graphics and Larrabee, and Nvidia. All four face different goals, constraints and hence have taken slightly different paths. It is in this context that Nvidia has announced a next generation architecture, Fermi, which aims for even greater performance, reliability and programmability; unlocking even more software capabilities.
The computer industry is on the cusp of yet another turn of the Wheel of Reincarnation, with the graphics processor unit (GPU) cast as the heir apparent of the floating point co-processors of days long gone. Modern GPUs are ostensibly higher performance and more power efficient than CPUs for their target workload, and many companies and media outlets claim they are leaving CPUs in the dust. Is this really the case though? This article explores the quantitative basis for these claims, with some surprising results.
Nvidia’s corporate strategy firmly rests on expanding the market for GPUs beyond graphics to include certain types of computation. Specifically, Nvidia’s efforts with CUDA are aimed at moving GPUs into the high performance computing (HPC) market, where the substantial compute capabilities and memory bandwidth directly translate into performance. Nvidia’s Tesla products (GPUs designed for computation instead of graphics) have made a bit of a splash, but at the moment the adoption is extremely limited. GPU clusters are basically non-existent, at least in part due to the lack of error detection and correction, which we believe will be corrected in the next product release from Nvidia.