Llano at Hot Chips

AMD’s Hot Chips presentation delved into Llano, the first mainstream Fusion product, with details and results for power management. Previous disclosures painted a poor picture, which is far from the truth. Given the older CPU and GPU designs and time-to-market pressure, the results are quite good. Llano’s power management focuses on the most important aspects and is a solid foundation for future generations that will be much more power aware and optimized.

Read MoreDiscuss (14 comments)

Sandy Bridge for Servers

Pages: 1 2 3

Intel’s Sandy Bridge-EP arrives late this year to take on AMD’s Bulldozer in 2 and 4-socket servers. It offers up to 8 cores with a new system architecture including 20MB L3 cache, 4 DDR3 memory controllers and faster 8GT/s QPI 1.1 links. Sandy Bridge-EP is also the first server CPU to integrate PCI-E 3.0 on-die, with up to 40 lanes – a significant bandwidth and power efficiency advantage. This article compares the system architecture and design to previous approaches and shows that Sandy Bridge-EP will be a compelling upgrade for 2-socket servers and attractive for certain 4-socket systems, particularly those with large I/O needs.

Read More (3 pages)Discuss (104 comments)

Intel’s Quick Path Evolved

Pages: 1 2 3

Intel’s Quick Path Interconnect (QPI) was a massive step forward over the front-side bus that was used from 1995-2008. QPI finally caught up and exceeded AMD’s HyperTransport, helping Intel retake much of the server market. The next generation QPI 1.1 was re-architected based on trends and changes in the computer industry. QPI 1.1 is an incremental improvement at the physical and logical layer, but a substantial change in the coherency protocol. Sandy Bridge-EP will be the first product to implement QPI 1.1, later this year.

Read More (3 pages)Discuss (32 comments)

What Do Overclockers and Supercomputers Have in Common?

Pages: 1 2

Enthusiasts and engineers know cooling is vital; it raises frequency and dramatically lowers power by reducing CPU or GPU temperatures. The world’s fastest supercomputer shows that thermal management can increase CPU performance/watt by 20% and cooling is critical for 3D integration and Moore’s Law.

Read More (2 pages)Discuss (38 comments)

Poulson: The Future of Itanium Servers

Pages: 1 2 3 4 5 6 7 8 9

Over a decade, Itanium scaled down to 65nm re-using the same basic design. The new 32nm Poulson architecture moves from static VLIW to a more conventional pipeline. It has a new core with dynamic scheduling, fine-grained multithreading and a shared L3 cache. The net result is a vastly more efficient microprocessor that should achieve 2.5-2.8X higher performance and power high-end servers for the next 10 years.

Read More (9 pages)Discuss (208 comments)

Sandy Bridge ISSCC Update

Pages: 1 2

Intel’s Sandy Bridge ISSCC paper discusses a number of challenges they will eventually impact most vendors. The novel architectural choices and circuit design solutions that they describe give insight into current and future products from Intel, but also the general direction of the industry. The overarching theme is taking advantage of Moore’s Law at 32nm and beyond, which entails considerable attention to design complexity, process variation, power efficiency and validation.

Read More (2 pages)Discuss (8 comments)

Sandy Bridge SPECcpu2006 Estimates

Pages: 1 2

Sandy Bridge SPECcpu2006 estimates are finally available. The data show per-core performance increased by 30% or more compared to the fastest Westmere design. We analyze the performance numbers for Intel’s newest microarchitecture and estimate gains of 12% for multi-threading on integer workloads. We also show high sensitivity for integer performance to frequency and much more limited response for floating point workloads. Last, we assess the implications for AMD to match Sandy Bridge’s performance for both throughput and single threaded workloads.

Read More (2 pages)Discuss (2 comments)

Introduction to OpenCL

Pages: 1 2 3 4

A critical question for GPU computing is how programmers will interface with the underlying hardware. Users have the choice between three APIs: Nvidia’s proprietary CUDA, Microsoft’s DirectCompute and OpenCL. Of the three, OpenCL has garnered the most enthusiasm across the PC ecosystem (e.g. AMD, IBM, Intel and Nvidia) and the mobile and embedded market (e.g. ARM and Imagination Technologies). While still a nascent technology, OpenCL is very popular because it is an open, industry standard that promises compatibility on a huge variety of hardware. This article explores aspects of OpenCL, including the early development efforts at Apple and the standard itself, including the execution and memory model.

Read More (4 pages)Discuss (44 comments)

Intel’s Sandy Bridge Microarchitecture

Pages: 1 2 3 4 5 6 7 8 9 10

At IDF, Intel revealed the future Sandy Bridge microprocessor. It is an entirely new design – a synthesis of Nehalem, ideas from the Pentium 4 and a new Gen 6 graphics architecture. The result is a novel microprocessor, GPU and system infrastructure tightly integrated into a 32nm chip. This report details Sandy Bridge’s microarchitecture including the uop cache, AVX, memory pipelines, ring-based L3 cache and Turbo Boost, concluding with the expected performance relative to AMD’s Bulldozer.

Read More (10 pages)Discuss (843 comments)

AMD’s Bulldozer Microarchitecture

Pages: 1 2 3 4 5 6 7 8 9 10

At Hot Chips 2010, AMD released details on their upcoming Bulldozer microarchitecture, intended for server and high-end desktop CPUs. Bulldozer is a high frequency design that is also tailored for multi-core throughput by sharing between cores. Interlagos, the first implementation, will feature 16 cores and debut in mid to late 2011 on a 32nm manufacturing process. This article explores Bulldozer’s novel design trade-offs and AMD’s new approach to multi-core efficiency.

Read More (10 pages)Discuss (158 comments)