For 4 years, Intel has struggled to move into the market for mobile devices. Conventional wisdom holds that x86 is too inefficient for smart phones. The recently announced 32nm Medfield proves that x86 is a viable option and that Intel can design smart phone products. We explore the Medfield SoC and analyze the impact on Intel’s mobile strategy.
Nvidia’s Kal-El sports a novel 5th ‘companion’ core to lower idle power. We look at the trade-offs and benefits to this approach and explain why it will be a strong tablet SoC, but only an incremental gain for smartphones.
AMD’s Hot Chips presentation delved into Llano, the first mainstream Fusion product, with details and results for power management. Previous disclosures painted a poor picture, which is far from the truth. Given the older CPU and GPU designs and time-to-market pressure, the results are quite good. Llano’s power management focuses on the most important aspects and is a solid foundation for future generations that will be much more power aware and optimized.
Sandy Bridge is the first GPU tightly integrated with an x86 through a shared L3 cache. Graphics performance has doubled, thanks to new shader cores and more powerful fixed functions. Sadly, there is no OpenCL or DirectX11 support till Ivy Bridge. Multimedia is superb, with full hardware decoding and accelerated encoding exposed through an API. The new design is a huge advance, but much work remains for future generations.
AMD has a grand vision for software and physical integration of CPUs and GPUs. The first Fusion generation focused on time to market, but created a solid foundation. Llano is a surprisingly attractive mid-range and value notebook product, due to the vastly enhanced power management. Future Fusion products will upgrade the CPU, GPU and media hardware and move towards a more tightly integrated computing model.
Enthusiasts and engineers know cooling is vital; it raises frequency and dramatically lowers power by reducing CPU or GPU temperatures. The world’s fastest supercomputer shows that thermal management can increase CPU performance/watt by 20% and cooling is critical for 3D integration and Moore’s Law.
Memory bandwidth is a critical to feeding the shader arrays in programmable GPUs. We show that memory is an integral part of a good performance model and can impact graphics by 40% or more. The implications are important for upcoming integrated graphics, such as AMD’s Llano and Intel’s Ivy Bridge – as the bandwidth constraints will play a key role in determining overall performance.
Intel’s Sandy Bridge ISSCC paper discusses a number of challenges they will eventually impact most vendors. The novel architectural choices and circuit design solutions that they describe give insight into current and future products from Intel, but also the general direction of the industry. The overarching theme is taking advantage of Moore’s Law at 32nm and beyond, which entails considerable attention to design complexity, process variation, power efficiency and validation.
Modern graphics processors are incredibly complex, but understanding their performance is essential, as they become an increasingly important component of computer systems. In this report, we use a set of benchmark results to build accurate performance models for AMD and Nvidia GPUs. We verify that our model can predict performance within roughly 6-8% for many desktop graphics cards and show how Nvidia’s microarchitecture and drivers achieve roughly 2X higher utilization than AMD’s VLIW5 design.
The major trend in graphics is programmability and targeting highly parallel, general-purpose workloads. Historically, AMD has focused on gaming performance. However, DirectCompute and OpenCL are beginning to take hold and create the seeds of a software ecosystem. AMD’s new Cayman architecture is a gradual and evolutionary step towards more general purpose hardware and a cautious embrace of GPU computing. While primarily a graphics processor, Cayman has made some fundamental microarchitecture changes to improve programmability and performance. In this article, we explore the Cayman architecture including the new VLIW4 SIMD, dynamic power management and other enhancements. Our report concludes with a preliminary assessment of the Radeon 6970 and 9650 graphics cards and projections for frequency, power and performance of future compute products.