Silvermont is Intel’s first CPU core tailored for power efficient applications such as smartphones, tablets, and microservers. The 22nm microarchitecture features updated instruction set extensions, full out-of-order execution with a tightly coupled L2 cache, aggressive power management, and a new high performance SoC fabric. These enhancements deliver tremendous performance and frequency gains over the aging Atom core, putting Intel’s mobile strategy in a more competitive position.
Intel’s Haswell CPU is the first core optimized for 22nm and includes a huge number of innovations for developers and users. New instructions for transactional memory, bit-manipulation, full 256-bit integer SIMD and floating point multiply-accumulate are combined in a microarchitecture that essentially doubles computational throughput and cache bandwidth. Most importantly, the microarchitecture was designed for efficiency and extends Intel’s offerings down to 10W tablets, while maintaining leadership for notebooks, desktops, servers and workstations.
Sandy Bridge is the first GPU tightly integrated with an x86 through a shared L3 cache. Graphics performance has doubled, thanks to new shader cores and more powerful fixed functions. Sadly, there is no OpenCL or DirectX11 support till Ivy Bridge. Multimedia is superb, with full hardware decoding and accelerated encoding exposed through an API. The new design is a huge advance, but much work remains for future generations.
Rumors aside, Apple will not switch their laptops to ARM any time soon. Despite Apple’s previous migrations, there are too many technical and business challenges and too few benefits. Moreover, Apple’s chip designers are better suited to enhancing the iPhone and iPad to fend off commodity Android systems. We look at the reasons Apple will stay with x86 notebooks for now, and how they might consider using ARM in the future.
The major trend in graphics is programmability and targeting highly parallel, general-purpose workloads. Historically, AMD has focused on gaming performance. However, DirectCompute and OpenCL are beginning to take hold and create the seeds of a software ecosystem. AMD’s new Cayman architecture is a gradual and evolutionary step towards more general purpose hardware and a cautious embrace of GPU computing. While primarily a graphics processor, Cayman has made some fundamental microarchitecture changes to improve programmability and performance. In this article, we explore the Cayman architecture including the new VLIW4 SIMD, dynamic power management and other enhancements. Our report concludes with a preliminary assessment of the Radeon 6970 and 9650 graphics cards and projections for frequency, power and performance of future compute products.
At IDF, Intel revealed the future Sandy Bridge microprocessor. It is an entirely new design – a synthesis of Nehalem, ideas from the Pentium 4 and a new Gen 6 graphics architecture. The result is a novel microprocessor, GPU and system infrastructure tightly integrated into a 32nm chip. This report details Sandy Bridge’s microarchitecture including the uop cache, AVX, memory pipelines, ring-based L3 cache and Turbo Boost, concluding with the expected performance relative to AMD’s Bulldozer.
At Hot Chips 2010, AMD released details on their upcoming Bulldozer microarchitecture, intended for server and high-end desktop CPUs. Bulldozer is a high frequency design that is also tailored for multi-core throughput by sharing between cores. Interlagos, the first implementation, will feature 16 cores and debut in mid to late 2011 on a 32nm manufacturing process. This article explores Bulldozer’s novel design trade-offs and AMD’s new approach to multi-core efficiency.
PhysX is a key application that Nvidia uses to showcase the advantages of GPU computing (GPGPU) for consumers. PhysX executing on an Nvidia GPU an improve performance by 2-4X compared to running on a CPU from Intel or AMD. We investigated and discovered that CPU PhysX exclusively uses x87 rather than the faster SSE instructions. This hobbles the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
CSI, Common System Interface, Coherent Interconnect, Intel, Nehalem, Tukwila, QuickPath Interconnect
The continuing pace of chip level feature miniaturization – Moore’s Law – has resulted in the doubling of the number of transistors per unit area approximately every couple of years. Chip designers have been provided with a plethora of transistor options to choose from in order to optimize for a given constraint. New materials with higher dielectric constants such as hafnium-based high-k gate oxide materials, along with metal gate electrodes, decrease leakage and boost drive current. Strained silicon engineering enables higher transistor switching speeds. Different transistor designs featuring multiple threshold voltages optimize for low power or high performance applications.