Pages: 1 2
Altivec – Velocity Engine – Vector SIMD
In the PowerPC 970 processor, an Altivec-Compatible SIMD vector unit was added to the processor so that the processor can execute vector instructions from the Altivec vector SIMD ISA extension. Prior to the announcement of the PowerPC 970 processor, much speculation had been engaged as to the reason for the cryptic description of the "162 specialized SIMD instruction". It was well known that Altivec also had a similar number of SIMD instructions. However, the omission of the "Altivec" label was conspicuous by its absence. Furthermore, the presenter of the PowerPC 970 processor, Peter Sandon, had previously directed the design effort of a PowerPC compatible processor codenamed Gekko. Gekko had its own set of SIMD extensions, and this processor is now used by Nintendo in the Nintendo GameCube. As a result of these circumstances, it was unclear at the outset whether the 162 specialized SIMD instructions were indeed Altivec-compatible or not. During the presentation of the PowerPC 970 processor, Peter Sandon revealed that the Vector SIMD units were indeed Altivec-compatible. The vector SIMD ISA was co-developed by IBM and Motorola, but the "Altivec" name was trademarked by Motorola, so IBM could not use Motorola’s trademarked name to describe a functionally compatible implementation of the same vector SIMD unit.
The PowerPC 970 processor has relatively long pipeline structures compared with the previous generation PowerPC G3 and G4 processors. There are 9 pipeline stages devoted to instruction fetch and decode, 5 to 13 pipeline stages are used for the out of order execution units. Simple integer instructions can execute in 5 stages, whereas more complex vector floating point instructions may take as many as 13 stages to complete execution. The somewhat surprising disclosure of the long 9 stage instruction and fetch decode pipeline was clarified with the work performed in this 9 stage fetch and decode process. As it turns out, some of the older PowerPC instructions are rather complex, and the solution that IBM had adopted in the POWER4 and the PowerPC 970 processor was an instruction cracking process whereby some complex instructions are cracked into simpler, more RISC-like instructions. These simpler instructions are then sent to the dispatch and execution units in the processor core. Although this process superficially resembles the micro-op or risc-op decoding steps seen in various x86 processors, the PowerPC instruction set is not nearly as complex as the venerable x86 ISA, and only few instructions in the PowerPC ISA needs to be cracked into simpler instructions for execution.
The processor will be manufactured using IBM’s 0.13um SOI process with 8 layers of copper interconnects. The PowerPC 970 processor has already taped out, and parts exists in labs within IBM, and they are currently undergoing performance evaluation and system debugging work. IBM expects that the processor will be mature enough to be released to manufacturers for sampling in Q2 2003, and if all goes well, volume production should commence in the second half of 2003.
The processor should achieve a clock frequency of 1.4 GHz to 1.8 GHz on the 0.13um SOI process. Average power consumption at 1.8 GHz is expected to be 42 Watts with 1.3V Vcc. IBM estimates that the PowerPC 970 processor would be able to achieve a SPEC CPU 2000 INT score of 937 and a SPEC CPU 2000 FP score of 1051 at 1.8 GHz. These performance numbers compare well against Intel’s current top end offering of a 2.8 GHz "Northwood" Pentium 4 processor. The 2.8 GHz Pentium 4 currently achieves a SPEC CPU 2000 INT score of 970 (base) and a SPEC CPU 2000 FP score of 976 (base). However, at the expected release date of 2H03, the PowerPC 970 would presumably have to compete against a 3.0+ GHz, Prescott based Pentium X processor. As a result, PowerPC 970 may not have the performance lead as measured with the SPEC CPU suite upon release in the second half of 2003. None the less, these preliminary SPEC CPU scores indicate that the performance of the PowerPC 970 processor will be far more than simply respectable, and the performance indicates a large leap above and beyond current PowerPC offerings from Motorola and IBM, respectively.
Packet Based System Interconnect
One of the more interesting aspects of the PowerPC 970 processor is the system interconnect. Unlike the bi-directional processor busses seen on Intel IA-32 and IA-64 processor, or even the bi-directional point to point interconnects used on Alpha EV6 and AMD Athlon processors, the system interconnect of the PowerPC 970 processor are uni-directional, point to point, source synchronous interconnects that do not have to worry about bus loading factors or bus turn around times, and the interconnect can wave-pipeline multiple number of bits of data on the wires concurrently. The most difficult part of such high frequency system interconnect may be the deskewing circuitry that would be required. In this case, the PowerPC 970 appears to have benefitted well from the POWER4 lineage, where the deskewing circuitry for a wavepipelined interconnect was previously disclosed by IBM.
The system interconnect on the PowerPC 970 has been designed to operate at an integer fraction of the CPU core frequency. At a CPU core frequency of 1.8 GHz, the system interconnect will operate at a frequency of 900 MHz. With two unidirectional 32 bit wide interconnects, one from the CPU to the companion system controller chip, the other from the companion system controller chip back to the CPU, the system interconnects can provide 3.6 GB/s of raw system bandwidth on each direction for an aggregate bandwidth of 7.2 GB/s. However, the unidirectional links must multiplex address and control information onto the same interconnects, and when these overheads are taken into considerations, IBM claims an effective peak data bandwidth of 6.4 GB per second.
32 Bit and 64 Bit Architecture
One question that has been asked is whether or not Apple would seek to adopt the PowerPC 970 processor and move to replace the PowerPC 7455 based processors currently shipping in Apple Macintosh computers. The consensus as presented by the analysts at Microdesign Resources is that there are but a few minor hurdles for Apple to accept the PowerPC 970 processor and use it as its main line desktop processor. One of the hurdles is the 32/64 bit question. Current PowerPC processors used in Apple Macintosh computers are 32 bit processors from Motorola, and the PowerPC 970 processor is a 64 bit processor. IBM addressed this point specifically by stating that the PowerPC 970 processor inherits the 32/64 bit mode switching mechanism from POWER4, and allows for relatively painless transition between 32 bit mode and 64 bit mode operation. Although 32 bit operating systems cannot operate on the PowerPC 970 processor without modification, the necessary modifications have been designed to be minimal. IBM also announced that the PowerPC 970 processor currently runs both 32 bit as well as 64 Linux in the testing labs within IBM. It is believed that with the minimal modification and support by the operating system, user mode 32 bit PowerPC applications can then run on the PowerPC 970 processor without modifications. With these assurances from IBM, it is believed that should Apple adopt the PowerPC 970, current 32 bit software would be able to run in a seamless fashion, while a 64 bit environment would also be available to the developers in the same system..
System Chip Support
One of the more troublesome hurdle for Apple to overcome in the adoption of the PowerPC 970 processor may be the system engineering aspect of the processor. As described previously, the 4 byte wide unidirectional serial links may provide upwards of 6 to 7 GB of raw bandwidth per second. However, the specification of the ~900 MHz operation on the system board would require considerable investment into the system support chip. Moreover, the nature of the point to point interconnect means that to support a dual CPU system, the companion chip must be designed with the dual CPU SMP in mind, with dedicated channels devoted to each CPU. Furthermore, to support the high bandwidth available on the system interconnect, a dual channel PC2700 DDR SDRAM memory system would appear to be a minimum requirement to support a single CPU. Unless Apple can also obtain a low cost support chip from IBM, the PowerPC 970 processor would likely force the Apple Macintosh product lines to become even more upscale, and Apple would likely retain the use of the PowerPC G4 processors for the lower end iMac and eMac product lines.
Be the first to discuss this article!