Q. Who is AMD and what is the K6?
A. AMD has a long history in the x86 microprocessor business. In the early 1980s, AMD provided the 8088 and 286 processors as an Intel licensed alternative source. When the 386 came out, Intel decided to break off its agreement with AMD and the two companies spent nearly a decade locked in bitter litigation. AMD eventually produced 386 and 486 microprocessors that were derived from Intel's designs through reverse engineering.
Starting with the Pentium generation, AMD shifted its design strategy. AMD began creating its own original processor design for the K5. The K5 runs the same software as Intel's Pentium and plugs into the same socket, but internally the design is entirely different from Intel's.
Being AMD's first experience with an independent x86 design, it took the company longer than planned to get the K5 into production. AMD underestimated the difficulty of creating an independent design and made some bad design choices. As a result, the K5 was late to market. When it did arrive, it was slower than expected. After a major revision, AMD was able to boost the chip's performance considerably. But Intel's mainstream chips got much faster in the meantime.
AMD's future lies with the K6, which began shipping this spring. The K6, like the K5, is an independent design that plugs into a Pentium socket and runs the same software including the MMX instruction-set extensions. It is not an enhancement of the K5, however, but comes from an entirely different design group.
The K6 started life as NexGen's 686. AMD bought NexGen and the team that created it when the K5 turned out to be disappointing. The K6 is a revised version of what would have been the NexGen 686.
The AMD K6 is a socket-7 MMX compatible microprocessor designed for high performance running both 16-bit and 32-bit software. It delivers performance competitive with an equivalent Intel processor running Windows operating systems and applications. To accelerate multimedia performance and earn MMX compatibility, AMD licensed the Intel MMX instructions set rather than create their own.
AMD designed the AMD K6 processor to fit the low-cost, high-volume Socket 7 architecture. This enables PC manufacturers and resellers to speed time to market, minimize redesign costs, and deliver systems with an easy upgrade path for the future.
Q. What are the architectural features of the AMD K6?
A. The AMD K6 processor's state-of-the-art six-issue, RISC86 superscalar core combines highly efficient RISC throughput with x86 instruction set compatibility to deliver superior sixth-generation performance. In addition, the advanced AMD K6 processor design incorporates large split 64Kbyte L1 cache, multiple sophisticated decoders, specialized parallel execution units, a high performance floating point unit, and industry standard MMX. This innovative microarchitecture and unique design techniques enable a powerful processor that delivers Pentium II class performance.
The K6 CPU core uses a design approach similar in concept to the Pentium II, in which x86 instructions are converted to an internal instruction format. Intel calls this internal format "micro-ops"; AMD calls them RISC86 instructions. In either case, the idea is to decompose the more complex x86 instructions into simpler, RISC-like instructions. These RISC86 instructions are then queued up and can be executed speculatively and out of order.
Speculative execution means that when a conditional branch instruction is executed, the processor guesses which way the branch will go so that it can continue executing instructions until the branch condition is resolved. This is essential in a high-performance microprocessor; otherwise, the processor would spend too much time stalled waiting for conditional branches to be resolved. Most of the time, the processor can guess correctly, so no time is wasted. If it guesses wrong, the processor discards the work performed since the conditional branch instruction was executed and starts over, following the other path.
Out-of-order execution is another important technique for keeping a superscalar processor busy. A superscalar processor can execute more than one instruction at a time. If it has to execute instructions strictly in order, however, it will often get stuck waiting for one instruction to be completed. For example, one instruction might load a data item from memory. If this data item is not found in the cache, then many processor clock cycles will go by before the data is returned from memory. An out-of-order processor allows the processor to move on to the next instruction in this circumstance, as long as that instruction doesn't use the data for which the load instruction is waiting.
Handling speculative and out-of-order execution is complex, since proper logical operation of the program must always be guaranteed. By breaking the sometimes complex x86 instructions into simpler RISC-like instructions, the speculative and out-of-order processing of the internal instructions is made much simpler.
The K6 is a far more sophisticated design than the Pentium and the MMX Pentium, which are strictly in-order, nonspeculative processors. Although these processors can, in theory, execute two instructions per clock cycle, many limitations restrict their ability to do so.
The execution resources in the K6 and Pentium II are similar. Both have two integer units, one floating-point unit, one load unit, and one store unit. Many subtle differences exist between the capabilities of each processor's units, but they are alike in general.
The two processors' MMX capabilities and floating-point execution are notably different. Some instruction execution can be overlapped; as a result, there are two measures of instruction performance: the rate at which new calculations can be started, and the total time to complete an operation, called the latency.
AMD's K6 has a shorter latency than the Pentium II for basic floating-point operations (two cycles vs. three cycles), but these instructions cannot be overlapped, so a floating-point instruction can be started only once every two clock cycles. The Pentium II takes one cycle longer to complete each operation, but can start a new operation every cycle. When performing individual calculations, AMD's approach is faster, but most floating-point applications perform a long series of calculations, making Intel's approach faster.
As for MMX, both execute the same instructions, but perform differently. Both chips can start a new MMX multiply instruction every clock cycle, but Intel's design has a three-cycle latency; AMD's completes the instruction in a single cycle. This advantage for AMD is countered, however, by the fact that Intel has a dual MMX unit that can process two MMX instructions at a time (subject to certain restrictions). AMD's MMX unit is limited to a single instruction at a time. Another factor that works in Intel's favor is that programmers generally will optimize their code for the performance characteristics of Intel's processors, tending to give them an advantage.