During the early part of the decade, the paths for Intel and AMD’s CPU architecture diverged rather remarkably. Intel focused on the Pentium 4, a high clock speed design that relied on proper coding practices by programmers to achieve high performance. AMD in contrast, focused on extending the K7 architecture by overhauling the system and memory interfaces and refining the microarchitecture. Along the way, Intel discovered that high frequency designs are an anathema to mobile devices and evolved the venerable Pentium Pro (P6) design into a line of extremely successful notebook oriented CPUs that culminated with the 65nm Core Duo. Over the last two years, Intel’s CPU line-up has experienced a remarkable renaissance, as the seeds of the infamous right hand turn began to bear fruit and designs based on the Core 2 Duo displaced the last remnants of the Pentium 4.
In many ways, the Core 2 is the clear descendent of a mobile CPU – for instance the focus on dual-core implementations (at 45nm) or the simple clock distribution. As it turns out, the Core 2 is a well balanced design that is excellent for desktops and servers, but the feature set focuses on mobile. This reflects the strengths of Intel’s Haifa design team, which has specialized in mobile designs since the Timna project, but only recently began working with server designs. In comparison, Intel’s Hillsboro design group has a long history of server design and validation, dating back to the original P6.
Currently, almost all of Intel’s product portfolio is based around the microarchitecture of the Core 2 and uses the aging front-side bus as a system interface. In fact, this is one of the largest technical distinctions between Intel and AMD’s offerings. AMD’s Barcelona, unlike the Core 2, is clearly aimed at servers first and foremost and takes full advantage of the integrated memory controller, HyperTransport and integrates four cores in a single die. Unfortunately, Barcelona has not met expectations yet – it was released later than expected, at lower frequencies and suffered from a functional bug in the translation look-aside buffer which required a microcode workaround. The latest release of Barcelona, the B3 stepping last month, fixes the most serious of these problems (the TLB bug) and will likely herald frequency increases that will bring Barcelona’s performance into a range that is competitive with Intel’s current generation of Core 2 based designs.
At this IDF, Intel is announcing the details of Nehalem, a second generation 45nm microprocessor and the next step in the evolution of their flagship line. Nehalem differs from the previous generation in that it was explicitly designed not only to scale across all the different product lines, but to be optimized for all the different product segments, from mobile to MP server. This implies a level of flexibility above and beyond the Core 2. Nehalem refines almost every aspect of the microprocessor, although the most substantial changes were to the system architecture and the memory hierarchy. This article describes in detail the architecture and pipeline of Nehalem, a quad-core, eight threaded, 64 bit, 4 issue super-scalar, out-of-order MPU with a 16 stage pipeline, 48 bit virtual and 40 bit physical addressing, implemented in Intel’s high performance 45nm process which uses high-K gate dielectrics and metal gate stacks .