The Battle in 64 bit Land: Merchant Chips on the Rise

Pages: 1 2 3 4 5 6 7 8

Today’s 64 bit Players

The performance footprints of the current contestants on the 64 bit MPU battlefield are depicted below in Figure 1, along with several 32 bit x86 chips thrown in for reference. To reflect the growing use and importance of hardware based multithreading and chip level multiprocessing (CMP), I have split the performance graph into two parts. Figure 1A shows device execution speed (single thread performance) from published SPECint/fp base2k performance scores, the metric used in previous articles in this series. This is complemented by a graph of device level throughput performance based on SPECint/fp base_rate_2k scores as Figure 1B. The latter shows the benefit of having more than one CPU within the MPU device and/or a CPU with integrated multithreading capabilities.

battle64-2004-fig1a.gif - 22257 Bytes
Figure 1A – Performance of Current 64 bit MPUs (execution speed)
battle64-2004-fig1b.gif - 18941 Bytes
Figure 1B – Performance of Current 64 bit MPUs (device throughput)

An interesting trend these two graphs reveal is that the high end RISC processor families running out their lifetime with elderly microarchitectures (Alpha, MIPS, PA-RISC, and SPARC), have slipped so far behind leading x86, PowerPC and IPF MPUs in execution speed they don’t even show up in Figure 1A. Yet several of them, namely PA-RISC and SPARC, have been kept moderately competitive in throughput performance per socket by going to two way CMP in recent implementations and appear in Figure 1B. Similarly, the POWER4+ is distinctly behind the Madison Itanium 2 in execution speed, but having two CPUs on the die puts it into the clear lead in throughput per device, especially for integer workloads (SPECint_rate_base2000).

This second observation illustrates a major dilemma for contemporary microprocessor architects – how to allocate scarce on-chip resources between chasing ever more difficult instruction level parallelism (ILP) exploitation and thread level parallelism (TLP) which is much easier to harness with multithreading or multiprocessing. TLP exploitation is clearly the easier road to take to increase device throughput, but improvements from TLP don’t increase single thread execution speed and to some extent can even hurt it. Single thread execution speed is not only the primary figure of merit for general purpose MPU performance, but it is also the primary concern for an important cross section of customers.

Pages: « Prev   1 2 3 4 5 6 7 8   Next »

Discuss (59 comments)