The Tale of the Tapeout
In Figures 1 and 2 are shown the known floorplans of the Alpha 21264 and Merced drawn to the same scale based on my die size estimates. I have labeled the major blocks and also color coded the chip regions into three basic categories: functional units (blue), caches and TLBs (green), and control functions (red). The final category includes instruction fetch and decode logic, chip interface logic, and routing resources.
The most striking observation about Figures 1 and 2 is how much smaller the Alpha EV68 is than the IA-64 Merced. Also, for an implementation of an ISA that is supposedly so streamlined, Merced is practically blushing with all its red control regions. The estimated die area breakdowns of the Alpha and Merced (based on Figure 1) are provided below in Table 1.
The values in Table 1 show that Merced devotes about 11% more of its die area to control related functions than the EV68. The breakdowns also show that the total control function area of Merced is actually larger than the entire Alpha processor! (179 mm2 vs 156 mm2). Remarkably this is still true even if you discard the red block labeled “IA-32 Control” which incorporates some, if not all, of the x86 compatibility logic.
The EV68 also has nearly twice as much of its die dedicated to caches than does Merced (29% vs 16%). From the absolute area of the Merced caches I estimate the L0 cache at 32 Kbytes and the L1 cache at 128 Kbytes. Note that Intel changed its nomenclature since last year and the L0 in Figure 1 is now called an L1, the L1 is now the L2, and the L3 denotes the external cache within the Merced cartridge.
Better Luck Next Time Intel
The size disparity between the two processors strongly suggests the EV68 could be manufactured with much a lower silicon variable cost than Merced. Then there is performance. Won’t the bigger chip have more functional units and thus higher performance? Well, from a clock frequency point of view, a smaller chip is better because it is easier to distribute a low skew clock across the entire die and propagate time critical global signals. Having lots of functional units will do precious little for performance if the memory hierarchy cannot support them. With only around 16% of its die area devoted to on-chip cache yet more integer and floating point units than the EV68, the in-order execution Merced looks like it might spend a lot of time waiting for data and instructions to arrive from off-chip sources (L3 cache and system bus).
Today the Alpha EV67 provides performance levels up to 39 SPECint95 and 68 SPECfp95 at 700 MHz with higher clock rates promised. Compaq estimates that the EV68 will achieve clock rates in excess of 1 GHz and yield around 65 SPECint95 and 100 SPECfp95. But the latest estimate from MDR is that Merced will clock at 750 to 800 MHz and achieve about 45 SPECint95 and 70 SPECfp95. For a high end 0.18 um processor these results would be good but not outstanding. Certainly not enough to live up to the Intel and Hewlett Packard (Intard?, Hewtel?) hype about EPIC and IA-64. If you mention the MDR Merced performance estimates to some people in silicon valley you elicit a snicker and a reply along the lines of “yeah, when hell freezes over”.
So it seems that Intel’s first cut at IA-64 will not be a simpler design to implement than a dynamically scheduled superscalar RISC processor like the Alpha EV68 or be capable of outperforming it in a similar process. Two years ago Intel’s Fred Pollack exclaimed “wait for McKinley – it will knock your socks off!”. He already knew that Merced would fall well short of that mark. Aside from its potential to remove footwear, McKinley will go far in demonstrating if Merced is just a botched first effort or if IA-64 architects unwittingly exchanged one set of difficult design complexity problems for another.
Be the first to discuss this article!