For over 40 years, the planar transistor has been the keystone of the semiconductor industry. Intel’s new 22nm tri-gate transistor is revolutionary, moving transistors into a three dimensional world. After 10 years of research, this novel structure is the next step for Moore’s Law and promises to substantially improve performance and power efficiency.
Memory bandwidth is a critical to feeding the shader arrays in programmable GPUs. We show that memory is an integral part of a good performance model and can impact graphics by 40% or more. The implications are important for upcoming integrated graphics, such as AMD’s Llano and Intel’s Ivy Bridge – as the bandwidth constraints will play a key role in determining overall performance.
Intel’s Sandy Bridge ISSCC paper discusses a number of challenges they will eventually impact most vendors. The novel architectural choices and circuit design solutions that they describe give insight into current and future products from Intel, but also the general direction of the industry. The overarching theme is taking advantage of Moore’s Law at 32nm and beyond, which entails considerable attention to design complexity, process variation, power efficiency and validation.
Sandy Bridge SPECcpu2006 estimates are finally available. The data show per-core performance increased by 30% or more compared to the fastest Westmere design. We analyze the performance numbers for Intel’s newest microarchitecture and estimate gains of 12% for multi-threading on integer workloads. We also show high sensitivity for integer performance to frequency and much more limited response for floating point workloads. Last, we assess the implications for AMD to match Sandy Bridge’s performance for both throughput and single threaded workloads.
As Moore’s Law continues, each new generation of semiconductor manufacturing is ushered in by new challenges, hurdles and solutions. At ISSCC 2011, a panel with speakers from Global Foundries, IBM, Intel, Renesas and TSMC discussed manufacturing and circuit design interactions at the upcoming 22nm node. Industry leaders have reached a broad technical consensus, although with several subtle differences. This report explores the key challenges and solutions at 22nm; focusing on variation and co-optimization between design and manufacturing. As a result of the needed collaboration, understanding of physical design and manufacturing is even more critical to cutting edge chip development and achieving good performance, power and yields.
The integration predicted by Moore’s Law is fundamentally driven by advances in semiconductor manufacturing. One of the key challenges is scaling to ever finer and denser geometries, while improving the performance of transistors. IEDM and the VLSI Symposium are the premier venues to discuss the challenges and opportunities for future process technologies. No commercial 22nm process technologies were presented at IEDM 2010, but in the last two years a number of advances have been disclosed, both for high performance and low power applications. This article describes several 32nm and 28nm nodes from Intel, IBM’s Common Platform and TSMC, plus novel applications such as IBM’s 32nm eDRAM that have been disclosed at IEDM and VLSI.
At IDF, Intel revealed the future Sandy Bridge microprocessor. It is an entirely new design – a synthesis of Nehalem, ideas from the Pentium 4 and a new Gen 6 graphics architecture. The result is a novel microprocessor, GPU and system infrastructure tightly integrated into a 32nm chip. This report details Sandy Bridge’s microarchitecture including the uop cache, AVX, memory pipelines, ring-based L3 cache and Turbo Boost, concluding with the expected performance relative to AMD’s Bulldozer.
PhysX is a key application that Nvidia uses to showcase the advantages of GPU computing (GPGPU) for consumers. PhysX executing on an Nvidia GPU an improve performance by 2-4X compared to running on a CPU from Intel or AMD. We investigated and discovered that CPU PhysX exclusively uses x87 rather than the faster SSE instructions. This hobbles the performance of CPUs, calling into question the real benefits of PhysX on a GPU.
We analyze the performance of a 2.9GHz 65nm Core 2 Duo (aka Conroe) vs. a 2.8GHz 90nm K8, using performance tools from Intel and AMD. With VTune and Code Analyst we are able to extract performance counter information such as IPC, uop density, cache miss rates, branch mispredictions, memory accesses and other data that we use to explain the difference in performance between the two CPUs.
The first part focuses on characteristics which are common across both CPUs, while later parts will focus on microarchitecture specific counters.
This article presents a preview of ISSCC 2008, including discussion of Intel’s Itanium processor, codenamed Tukwila and an ultra-low power x86 MPU codenamed Silverthorne. Other presentations include Sun’s Rock and Niagara 3 processors, the 45nm CELL processor and assorted DRAM and SRAM prsentations.