Design By Committee
Unfortunately, things never stay simple for long. First of all, no one knows how well the functions performed by hardware within an out-of-order superscalar processor at run time can be accomplished a priori by even the most brilliant compiler. As an insurance policy the designers of PlayDoh and IA-64 added special logic and instructions that allow the processor to run quickly by taking potentially invalid shortcuts. The compiler adds a conditional branch after every shortcut to check the special logic. When an invalid shortcut is detected at runtime by the special logic (usually a very rare occurrence) the conditional branch diverts program flow to a specific recovery subroutine created by the compiler to perform the failed operation over again in a safe but slower fashion. This provides some of the benefit of adaptability to unforeseen circumstances that dynamically scheduled processors enjoy while potentially requiring simpler control logic.
These shortcuts are a very important part of PlayDoh and IA-64. If the compiler relied only on assumptions that could be logically derived or proven from the source program at run time then both the compiler’s code generation strategy and the processor design would have to be extremely conservative. As a result it would likely lose more performance than it could ever hope to gain through the simplification of processor control logic.
A major source of undesirable complexity was the institutional baggage that PlayDoh acquired as it evolved to IA-64. Probably the biggest and heaviest millstone placed around its neck is IA-32 (x86) compatibility. Intel is propelled by the most lucrative product franchise ever seen in the semiconductor industry: the x86 instruction set compatible processor. The Faustian bargain that HP agreed to when it partnered with Intel was that IA-64 had to run x86 code in hardware. To be fair HP also put in a few minor incongruities of its own in the form of the addp4, shladd, and shladdp4 instructions. Ostensibly these instructions assist the porting of PA-RISC applications to IA-64.
Does IA-64 Really Simplify the Silicon?
Intel and HP representatives never miss an opportunity to publicly promote the benefits of IA-64 for building the high performance microprocessors of the future. A joint white paper states “Intel’s IA-64 architecture is a unique combination of innovative features, which overcomes the performance limitations of traditional architectures”. Merced technical leader Harsh Sharangpani brags how IA-64 allowed his design team to do away with dynamic scheduling functions such as register renaming lookup tables (RLUTs) and instruction re-ordering buffers (ROBs) and make more room for functional units and cache.
I will attempt to assess the validity of some of these claims by comparing what is publicly known, or which can be reasonably inferred, about the Merced (a.k.a. Itanium, the first implementation of the IA-64 architecture) with a representative dynamically scheduled superscalar RISC microprocessor, the Compaq Alpha 21264.
Although Merced is known to be manufactured in Intel’s P858 0.18 um CMOS process, the exact die size has not been made public. We do know it must be relatively large because Intel has stated that Merced is too complex to be manufactured in their P856 0.25 um process. I will estimate the Merced die at 17.5 mm on a side or 306 mm2 in area. This happens to the die size target of Intel’s first P5 design (0.8 um Pentium) and first P6 design (0.5 um Pentium Pro). MicroDesign Resources (MDR) also estimates the Merced die size as around 300 mm2.
The Alpha 21264 is currently available in a 0.35 um version (EV6, 314 mm2), and a 0.28 um version (EV67, 210 mm2). It is widely expected that a 0.18 um version (EV68) will ship before Merced. For the purpose of comparison I estimate the EV68 at 12 mm by 13 mm or 156 mm2 in die area. (Other estimates peg the EV68 at a slightly more svelte 150 mm2).
Be the first to discuss this article!