Inside Fermi: Nvidia’s HPC Push

Pages: 1 2 3 4 5 6 7 8 9 10 11

In the last several years, the landscape for computing has become increasingly interesting and diverse. GPUs have gradually evolved to be less application specific and slightly more generalized than their fixed function ancestors. The changes started in the DirectX 9 time frame, with real floating point (FP) data types, but still fixed vertex, geometry and pixel processing. DX10 hardware was really the turning point with unified shaders, relatively complete data types (i.e. integers were added) and slightly more flexible control flow. Today the high-end is a four horse race between AMD nee ATI, Intel’s and AMD’s integrated graphics and Larrabee, and Nvidia. All four face different goals, constraints and hence have taken slightly different paths. It is in this context that Nvidia has announced a next generation architecture, Fermi, which aims for even greater performance, reliability and programmability; unlocking even more software capabilities.

Read More (11 pages)Discuss (281 comments)

Computational Efficiency in Modern Processors

Pages: 1 2 3

The computer industry is on the cusp of yet another turn of the Wheel of Reincarnation, with the graphics processor unit (GPU) cast as the heir apparent of the floating point co-processors of days long gone. Modern GPUs are ostensibly higher performance and more power efficient than CPUs for their target workload, and many companies and media outlets claim they are leaving CPUs in the dust. Is this really the case though? This article explores the quantitative basis for these claims, with some surprising results.

Read More (3 pages)Discuss (60 comments)

The Case for ECC Memory in Nvidia’s Next GPU

Nvidia’s corporate strategy firmly rests on expanding the market for GPUs beyond graphics to include certain types of computation. Specifically, Nvidia’s efforts with CUDA are aimed at moving GPUs into the high performance computing (HPC) market, where the substantial compute capabilities and memory bandwidth directly translate into performance. Nvidia’s Tesla products (GPUs designed for computation instead of graphics) have made a bit of a splash, but at the moment the adoption is extremely limited. GPU clusters are basically non-existent, at least in part due to the lack of error detection and correction, which we believe will be corrected in the next product release from Nvidia.

Read MoreDiscuss (45 comments)

NVIDIA’s GT200: Inside a Parallel Processor

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

Our analysis of NVIDIA’s latest GPU, the G100 (also known as the GT200 or GTX280)

Read More (12 pages)Discuss (72 comments)

Rambus Sets the Bandwidth Bar at a Terabyte/Second

Pages: 1 2 3

In this article, David Kanter covers Rambus’ recent announcement of the Terabyte Bandwidth Initiative (TBI), which is likely to be the successor to the XDR and XDR2 memory interface. The TBI is a high speed interface, which significantly improves the command/address architecture for better performance and is targeted at next generation consoles and graphics applications.

Read More (3 pages)Discuss (7 comments)

An Overview of High Frequency Processor-System Interconnects

Pages: 1 2 3 4 5 6 7

David reports on IBM’s system interconnect scheme, called Elastic I/O, that was presented at the Microprocessor Forum 2002.

Read More (7 pages)Discuss (4 comments)