Deja Vu? AMD’s Last Gambit
In 2002, AMD’s K7 was lagging behind the high frequency 130nm Pentium 4 (Northwood). The 2003 launch of the K8 was a bet-the-company kind of move for AMD. But, by 2005, they had clearly hit the jackpot with the K8 microprocessor. They had three out of four of the major OEMs pitching their server products and Intel’s products barely kept pace in the UP (1 socket) and DP (2 socket) server markets. In the MP (4 socket) server market, AMD was clearly the best choice by a wide margin.
This success was due to a confluence of favorable factors, both political and technical. Technically, the K8 was a solid conservative design that built off the previous generation K7, and was ideally suited for the server market. In contrast, Intel’s competing Pentium 4 was designed for consumer and media workloads, and was decidedly sub-optimal for servers and notebooks. Worse yet, the P4 failed to scale up frequency, which was the key idea behind the microarchitecture, due to power and thermal issues at 90nm and beyond. These design goals had been dictated by internal Intel politics at the highest level. Intel’s corporate strategy was to push Itanium processors for the server market, and x86 for the desktop and notebook space. Unfortunately, the Itanium project alienated Sun and IBM, and did not get as much traction as many had been expecting because of price, compatibility and availability issues. This left a hole in Intel’s plans, which is exactly where the K8 was aimed and was enthusiastically embraced by the Linux community and Microsoft. By any measure, AMD’s bet-the-company on the K8 strategy had worked out quite well, finally bringing them a measure of financial success.
Unfortunately, this got Intel’s attention rather quickly. Intel scrapped several P4 follow-on projects in various states of completion and changed direction rather rapidly. One strength of Intel’s culture and employees is the ability to not only deliver, but to excel and exceed expectations under pressure. Intel’s Israeli design team delivered in spades with the Core microarchitecture (based on the P6), which shipped in the first half of last year and easily took a lead in performance and efficiency across almost every market. The effects were dramatic, and certainly felt at AMD, which experienced both an erosion of profitability, average selling price and market share for the first quarter of the year, which will continue in the second quarter. Of course, this is a familiar position for AMD; precisely where they were in late 2002 and early 2003 before the Opteron launch. Yet again, AMD’s fate depends on a product that will be released in the near future.
Barcelona – The Proverbial Ace in the Hole
Over the course of the last year, AMD has slowly been revealing more and more details on their next generation processor, codenamed Barcelona. The first information came out in a keynote address from Senior Fellow Chuck Moore at the Spring Processor Forum in 2006. At the following Fall Processor Forum, Ben Sander gave a much more detailed outline of the microarchitecture for Barcelona. More recently, Shawn Searles gave a presentation at ISSCC ‘07 which described the physical implementation challenges of Barcelona and some of the design choices.
Barcelona is the first major architectural alteration to the K8, since it debuted in 2003. The K8 built on the very capable microarchitecture of the K7, and added 64 bit operation, two integrated DDR memory controllers and 3 HyperTransport lanes. These features were not novel; AMD’s architects followed in the footsteps of the Alpha EV7, which was the first MPU to integrated memory controllers (8 channels of DRDRAM), on-die routing (4 inter processor links) and directories. However, the K8 advanced the state of the art by bringing 64 bits and higher levels of integration to x86, the mainstream instruction set architecture. The K8 was the first AMD product to meet with any success in the server world, a clear testament to the wisdom of evolutionary and conservative design choices.
In many ways, Barcelona continues down this conservative path of evolution. There are no radical changes. In fact Barcelona has the same basic 12 stage pipeline as the K8, and many of the microarchitectural improvements in Barcelona have been successfully demonstrated elsewhere. This in no way detracts from the efforts of AMD’s architects and engineers – high risk features are inappropriate for a company that cannot afford a product failure.
This article will bring together all the existing information on Barcelona into a single place, discussing the system aspects and microarchitecture of Barcelona as well as the circuit design challenges and performance. This also presents a wonderful opportunity to examine what areas AMD focused on, in comparison to where Intel spent much of their effort enhancing the P6 core to produce the Pentium M and Core 2 line.
Barcelona is a 283mm2 design that uses 463M transistors to implement four cores and a shared 2MB L3 cache in AMD’s 65nm process. The SOI process uses 11 layers of copper interconnect with a low-k dielectric and dual stress liners and embedded SiGe for PMOS transistors. The device described at ISSCC was targeted at 2.2-2.8GHz at 1.15V, while operating within a 95W maximum thermal envelope. AMD claims that their 65nm process has a 15ps FO4 inversion delay, which suggests that Barcelona’s pipeline is just a little less than 24 FO4 delays. Later sections of this article will delve into seven major areas, the system architecture, the five major sections of the microarchitecture and lastly circuit level improvements and other features.