Apple Also Lags in Chipset Performance
Apple likes to brags about how powerful the PowerPC G4 is for processing multimedia and other streaming data sources with its so-called velocity engine (Motorola and the rest of the world calls it the Altivec instruction set extension). One recent ad campaign revolved around the idea that the G4 reaches performance levels of gigaFLOP/second, i.e. supercomputer levels, of performance and G4 Power Macs were subject to export restrictions. Besides bending the truth on export restrictions (the U.S. Department of Commerce imposes no such restrictions on the Mac), the GFLOP/s label is also quite dubious. Real supercomputers not only can crunch numbers quickly, but do it to huge sets of numbers, i.e. lots of memory bandwidth. Ironically, this is an area where Apple has been generally quite deficient. The system memory bandwidth of representative x86 PCs and Apple Macintoshes measured using John McCalpin’s STREAM benchmark (COPY score) are shown in Figure 5.
Figure 5. x86 and Power Mac STREAM Performance, 1994 to Present
Apple’s systems generally have had only about 60 to 70% of the effective memory bandwidth of contemporary x86 systems. This is due to Power Mac configurations that run the system bus at lower clock rates than comparable x86 PCs, and the simple fact that Apple’s system ASICs cannot match the technical excellence of the best x86 chipsets like the 440BX. A system architect in Apple’s Advanced Technology Group once said “The 60x processors are extremely sensitive to memory bandwidth in terms of the effect on performance”. It doesn’t appear that Apple’s chipset group ever took his pronouncement to heart. Despite Apple’s bluster, the Power Mac falls far short of supercomputer performance – the best G4 STREAM score is about an order of magnitude less than that of a twelve year old uniprocessor Cray YMP (a machine with a peak execution rate less than half the G4’s mythical 1 GFLOP/sec ).
Power Mac’s Future: Competing Only Gets Harder
It would be comforting for Macintosh aficionados to dismiss the current competitive failings of the PowerPC as a temporary aberration, a low point due to overlapping product schedules. But things will get a lot worse in the near future. Apple pins its hopes on a redesigned G4 processor announced at last October’s Microprocessor Forum. This processor, which I will refer to as the G4+, has three major improvements over the existing G4. First, the integer pipeline is stretched from 4 stages to 7 stages to improve clock frequency scaling. Secondly it adds two extra simple integer units, for a total of one complex and three simple (although the dispatch width only increases from 2 instructions plus a branch to 3 instructions plus a branch). Finally, the G4+ includes a 256KB on-chip L2 cache, as well on-chip tag support for an external L3 cache of up to 2 MB. Manufactured in a 0.18 um (0.13 um Leff) process, the G4+ is expected to eventually run at clock speeds exceeding 700 MHz. It is expected to start shipping in Power Macs in the second half of this year.
Unfortunately for Apple the new G4+ only makes the Mac competitive in both clock frequency and estimated SPEC performance with x86 processors shipping since late 1999 and far behind the well publicized “gigaprocessors” from AMD and Intel. More ominously, Motorola’s neat and tidy little 14 Watt G4+ resembles a deer frozen in the headlights of a couple of on-rushing x86 eighteen wheelers named Willamette and Thunderbird. Both of these processors will not only operate well above 1 GHz but will yield improved performance per clock – the Intel Willamette from its trace cache and double frequency ALUs, and the AMD Thunderbird with the powerful K7 core finally mated to an on-chip L2 cache. The desktop MPU landscape of late 2000 is predicted in Table 1.
| ||Motorola G4+||AMD Thunderbird||Intel Willamette|
Table 1. Estimated Desktop MPU Competitive Landscape (Late 2000)
It would appear that despite the best effort of Motorola designers to improve the PowerPC line for the desktop market, the clock rate and performance gap the leading x86 processors have opened in the last 12 months will continue to increase for at least next 12 months.
Discuss (18 comments)