From Russia, With Skepticism

Pages: 1 2 3 4 5 6 7

Die Size on a Diet

The 126 mm2 claimed for E2k strains credibility. The EV68 design disclosed this spring was a hybrid device with 0.18 um process transistors and 0.25 um process metal design rules, and its die size was given at 192 mm2. It is known that several variants of the EV68 are in the works and a true 0.18 um version with a 150 mm2 die size was listed in the early 1999 Compaq presentation ‘Alpha Microprocessor Futures’ by Bill Herrick. So how does the 28 million transistor E2k compare to the 15 million transistor EV68? The E2k has more than twice the on-chip cache, four times the effective number of floating point pipelines, and 50% more integer units (combining FP and integer functions in the same logical pipeline offers modest savings at best; more integrated approaches tend to compromise FP performance – the MIPS R4200 is a case in point). In addition, the E2k processor core has duplicated the huge 20 port 256 x 64 bit register file within both clusters. An approximate area breakdown by function of a hypothetical 150 mm2 EV68 is provided in Table 1. By scaling the EV68 numbers appropriately I attempted to estimate the area of the E2k. My 235 mm2 estimate is nearly twice as large the 126 mm2 figure from Elbrus.

Table 1 Estimates Area Breakdown of Alpha EV68 and Elbrus E2k

“Pure 0.18 um” EV68 (projected)

0.18 um E2k (Estimated)

Function

Comment

Area

Function

Comment

Area

FP reg File

72 physical regs, 8 prts

2.4

Unified RFs

2 x (256 regs, 20 ports)

46.1

FP Units

2 pipelines (MUL,ADD)

8.1

FP pipelines

4 MUL + 4 ADD

32.4

Int. reg Files

2 x (80 phy regs 10p )

3.6

Int. Units

Four pipelines

7.4

Int. Pipelines

Six pipelines

11.1

L1 D-cache

64 KB, 2w, 2p, ECC

21.1

L1 D-cache

2 x (8 KB, 1w, 2p)

6.0

L1 I-cache

64 KB, 2w, 1p

19.4

L1 I-cache

64 KB, 4w

20.0

L2 cache

not applicable

0.0

L2 cache

256 KB, 2w

40.0

TLBs

(128 dup’d) + 128

4.2

TLBs

(16 L1 + 512 L2) + 64

6.5

Branch Pred.

Tournament predictor

3.7

BP + Predicates

?

4.0

Memory I/F

reorder units, miss reg

10.8

Memory Interface

incl 4 KB prefetch buff.

4.0

Bus Interface

including dup data tag

8.5

Bus Interface

7.0

Internal Busing

dedicated channels

9.2

Internal Busing

conservative estimate

20.0

Instr. Logic

Decode, sched, retire

24.0

Instruction Logic

VLIW but many formats

10.0

I/O Region

27.6

I/O Region

27.6

Total Area (mm2)

150.0

234.7

My experience in chip design suggests that this simplified area estimate is rather conservative. The huge number of buses in the processor core, especially the six inter-cluster buses, will likely make things worse than I have shown. Although the E2k lacks the EV68’s logic for out-of-order instruction execution, its 2048 bit wide path from the I-cache into the instruction decoding logic, its multipath decoding and speculative execution, plus the extreme variability in the length and format of E2k instructions, makes the situation potentially much worse for the instruction decoding logic than I have shown. The most striking thing about E2k’s area estimate is the huge area dedicated to the duplicated unified register file. Register files increase in area per bit by approximately the square of the number of ports, because of the increase in both the number of data lines and control lines which run orthogonally to each other.

Power Consumption Estimate Shocking

The Elbrus team has estimated that the E2k would dissipate 35 Watts at 1.2 GHz. That figure is about the same as the worst case power for Intel’s newly announced 1.133 GHz Pentium III. Looking at the current bleeding edge of MPU technology, the 35 Watt figure is a little over half the 65 Watts Compaq has estimated its 0.25/0.18 um hybrid version of the EV68 would consume at 1 GHz, and about one quarter of the rumored power consumption of the 800 MHz Merced. The E2k is described as remarkably power efficient due to 1) extensive use of self-resetting logic to minimize clock loading, and 2) low voltage swing (700 mV) signaling on internal buses. It must be pointed out that self-resetting logic is a well known and commonly used MPU circuit design technique. And the much less ambitious Alpha EV6x design uses 200 mV signaling on its differential operand buses within its register files and execution units, yet still consumes far more than 35 Watts. Given the E2k’s 4 times greater FP resources, the multiplicity of long and heavily loaded 64 bit buses, and the very high activity factor to achieve the performance level of 350 SPECfp95 claimed by Elbrus, the 35 Watt figure is quite absurd.


Pages: « Prev   1 2 3 4 5 6 7   Next »

Be the first to discuss this article!