AMD: Borrowing Intel’s Formula for Success
Designing and selling generation after generation of backwardly compatible x86 processors has been a tried and true formula for success that has propelled Intel to clear supremacy in the semiconductor industry. Yet Intel seems so smitten by the siren call of VLIW/EPIC architecture that it has abandoned its legacy and instead pushed into the 64 bit world with IA64. AMD has quite rightly recognized that Intel’s decision has left in its wake AMD’s best and perhaps only, opportunity to enter the market for general purpose 64 bit microprocessors – pick up where Intel left off.
To take advantage of this opportunity AMD has designed a backwardly compatible 64 bit extension to the x86 instruction set architecture called x86-64. In addition to support for 64 bit integer operations and 64 bit flat logical addressing, x86-64 doubles the number of general purpose and SSE registers to 16. Traditional x86 is so register poor that x86-64 may be the first and only 64 bit extension of a 32 bit ISA where compiling an application that otherwise fits into a 4 GB address space into 64 bit code can in theory produce a faster executable. Whether this actually occurs in practice will depend on close x86-64 compilers come to existing x86 compilers in code quality. AMD also claims that 64 bit coding increased the average length of instructions from about 3.4 bytes to 3.8 byte but dynamic instruction count falls by 10% .
The first implementation of x86-64 is a family of 0.13 mm SOI devices known as K8 or Hammer which will be introduced by AMD later this year. Hammer MPUs are also the first AMD processors to support SSE2, the x86 instruction set extension Intel introduced in the Pentium 4. SSE2 includes enhanced support for scalar and 2-way SIMD double precision FP operations and is an important step in deprecating the x87 programming model which has hindered x86 FP performance from its inception. AMD originally envisioned a RISC-like FP model for Hammer called TFP but wisely decided it was easier and safer to ride Intel’s slipstream instead. The programmer’s model of the x86-64 processor state is shown in Figure 2.
Figure 2. Programmer’s Model of x86-64 Processor State
In a decision reminiscent of DEC’s strategy for the Alpha EV7, AMD decided to focus much of its K8 development efforts on the architecture surrounding the processor . To improve performance and reduce system level costs, important functions that have traditionally been implemented at the northbridge chipset level have been brought onto the Hammer processor device. These include the main memory controller and high speed interfaces for direct connection to southbridge style I/O bridges and linking processors together in multiprocessor systems. The downside of this strategy is that the extra circuitry brings with it higher power consumption and higher thermal load on the MPU package. In this case the extra power is partially offset by use of an SOI process which allows a given level of circuit performance to be achieved with somewhat reduced power consumption compared to bulk CMOS. The basic organization of the Opteron processor is shown in Figure 3.
Figure 3 Organization of the Opteron Processor
The Opteron, the high end version of the Hammer family for workstations and servers, includes three so-called HyperTransport high speed links and can be easily configured into systems with up to 8 CPUs. System level cache coherency is implemented using a broadcast style protocol. This scheme is simpler and less expensive to implement than the distributed directory system used by the EV7, but doesn’t scale nearly as well, a reflection of the class of system the two processors were designed to address. The Opteron directly supports a 128 bit wide DDR memory system. With support for memory speeds as high as PC2700, this gives a peak memory bandwidth of 5.3 GB/s although in practice it will often fall well short of this due to the inefficiency of using DDR to service short burst length transactions.
The desktop version of the Hammer family, known as Athlon 64, will be differentiated from the Opteron by a narrower 64 bit memory interface and fewer high speed links. Although the peak memory bandwidth of the Athlon 64 is half of Opteron the Athlon 64’s DDR memory system will work at a higher efficiency so the difference in effective bandwidth will likely be significantly less than 2x. AMD’s original plan for the Hammer family likely revolved around two different mask sets – the Athlon 64 with 256 KB of on-chip L2 cache and Opteron with 1 MB of L2 cache. This would give the Athlon 64 a comparably modest die size of 104 mm2 while maximizing differentiation with the workstation and server oriented Opteron .
But with Intel intending to introduce mainstream desktop chipsets supporting 128 bit wide DDR memory systems and higher speed system busses for its Pentium 4 desktop processor line this year AMD will probably be forced to release a version of Athlon 64 with 1 MB L2 for the top end of its desktop product line. Such a device will likely be a hybrid device with an Opteron die bonded out in the Athlon 64 package. The upside in doing this is that AMD can potentially drive up wafer volumes and drive down cost for Opteron and achieve greater flexibility and lower risk in planning Opteron wafer starts. The drawback is that this will increase manufacturing cost well beyond the 256 KB version that will already be more expensive than K7 Athlons due to the SOI processing.
In terms of performance the Opteron should prove to be a top performer in integer and commercial workloads. This will be in large part due to its efficient and well balanced processor core, relatively high clock rate (for a server class processor), and low latency memory system. For FP intensive applications Opteron will likely be competitive but no where near the leaders, Itanium 2 and Alpha. The biggest challenge Opteron will face isn’t the credibility of the silicon but rather the company behind it. AMD made very little headway in establishing Athlon within the corporate world, despite its high performance and low cost. This is a reflection of AMD’s low profile, uncertain future, and inability to sustain interest among the major computer OEMs. Unfortunately for AMD, as the price of a computer system rises from thousands for a PC to hundreds of thousands for server class hardware, the conservatism of corporate IT decision makers and the OEMs that sell to them rise in proportion. AMD’s best hope is establishing Opteron is in academic and government research type establishments which are often more open to buying new and unproven computing hardware. Another possibility is that low cost Opteron based systems could infiltrate the business world via departmental level purchases beyond the direct control of corporate IT policy makers. This bottom up approach was highly successful in the early spread of the Linux operating system.
Discuss (86 comments)