Sandy Bridge was once known by the codename Gesher, meaning bridge in Hebrew. While most Intel codenames are geographic in nature, in this case, it can be interpreted as a rather apt metaphor. Sandy Bridge is a synthesis of three separate worlds within Intel – blending the microarchitectures of the Pentium Pro and the Pentium 4 and a new implementation of the GenX graphics architecture. The result is tightly integrated with a novel system infrastructure into a single chip manufactured on Intel’s 32nm process. This article is the first in a series and focuses on the microarchitecture of the Sandy Bridge microprocessor, subsequent articles will deal with the graphics architecture and productizations.
Intel’s first out-of-order microprocessor was the Pentium Pro (P6). It was conceived and designed in Oregon and was also the first Intel CPU solidly targeted at the server market, with glueless SMP support for up to 4 sockets. At the time, servers and workstations were dominated by proprietary RISC architectures, but the Pentium Pro ultimately signaled the decline and eventual death knell of these competitors. Several RISC architectures survive today (notably PPC and SPARC), but they are in a constant race to stay ahead of the x86 ecosystem and no longer enjoy any real growth, but rather are confined to a few lucrative niches.
The P6 first arrived in 1995 on 0.5um and 0.35um BiCMOS processes and was subsequently refined in the consumer oriented Pentium II and Pentium III. At one point, that appeared to be the end of the line for the P6. The successors to the P6 were to be IA64 (for servers) and the Pentium 4 (for client systems). The Pentium 4 was a highly innovative microarchitecture and a radical departure from the previous generation – even more aggressively out-of-order than the P6 and tailored for extreme frequencies. The P4 featured many novel architectural choices such as a trace cache, simultaneous multi-threading and first class SIMD support with SSE 2. Prescott, the second generation of the Pentium 4 microarchitecture eventually hit 3.73GHz in 90nm.
The insatiable desire for high frequencies unfortunately could not overcome the underlying physics – Intel’s single minded pursuit of high frequency lead to high power operation, and there is a firm economic limit of 130W for a mass market product. It is possible to power and cool chips that operate at 250W or even 300W, but the solutions are very expensive and cannot be produced in high volume. Unfortunately, the P4 relied on reaching astronomical frequencies to achieve high performance. AMD took advantage of this weakness (and the delay to embrace a 64-bit x86 extension) and gained substantial market share; for the first time, AMD was a credible and even preferred vendor for server microprocessors. Intel’s management eventually realized that the high frequency and high power approach of the Pentium 4 was a dead end. They turned back to the P6 – which had been refined for low-power notebook applications by the Israeli design team in Haifa over 3 generations (130nm Banias, 90nm Dothan and 65nm Yonah).
The 65nm Merom was the first P6 derivative since 2000 that was intended to span the entire product spectrum, from clients to server. Merom was billed as a merger of the P4 and P6 microarchitectures, but in reality, there was relatively little taken from the P4. Nonetheless, it was a success and Intel began to claw back market share from AMD. The 45nm Nehalem microarchitecture was a further step in the right direction, building on Merom and incorporating simultaneous multi-threading and a new and vastly more efficient system architecture. Nehalem’s integrated memory controllers and on-die QuickPath Interconnect were especially critical for the server market. But Westmere, the 32nm shrink of Nehalem, will be the last P6 derivative from Intel. After 15 years, the P6 is finally being replaced by a new microarchitecture: Sandy Bridge.
The Sandy Bridge CPU cores can truly be described as a brand new microarchitecture that is a synthesis of the P6 and some elements of the P4. Although Sandy Bridge most strongly resembles the P6 line, it is an utterly different microarchitecture. Nearly every aspect of the core has been substantially improved over the previous generation Nehalem. Many of these changes, such as the uop cache or physical register files, are drawn from aspects of or concepts behind the P4 microarchitecture. While the P4 was ultimately a flawed implementation, it embodied many good ideas – ideas that are reappearing across the industry, and in Sandy Bridge. The underlying philosophy of Intel’s approach to CPU design is to focus on maximizing per-core performance and efficiency. This is a contrast to AMD’s Bulldozer, which backs off slightly from per-core performance to emphasize aggregate throughput. This article will explore the microarchitecture of Sandy Bridge – a 64-bit, quad-core, dual threaded, 4 issue, out-of-order microprocessor with the new 3 and 4 operand AVX instruction set extension, implemented in Intel’s 32nm process.