Summary and Conclusion
The IA64 instruction set architecture was designed by a committee. Close examination of this complex edifice suggest that this committee was dominated by enthusiastic compiler and architecture theorists rather than engineers experienced in the logic, circuit, and physical design of high performance monolithic CMOS microprocessors. But the impressive distance that x86 processors have come over the last quarter century suggest that it is easy to underestimate the potential for well funded teams of clever engineers to advance MPU performance and functionality in the face of harsh architectural adversity.
Given the modest starting point that Merced/Itanium represents it isn’t hard to suggest many different ways that the design could be improved. What is impressive about McKinley is the magnitude of the performance improvement. The only similar example of such dramatic performance improvement through better microarchitecture, circuit, and physical design was the transition from the Alpha EV56 to the EV6 in the same 0.35 um process. In both situations the system interface and memory system needed to be vastly improved to support the higher level of performance achieved. By eliminating the cost and packaging complexity of Itanium’s custom L3 SRAM devices, the McKinley may actually be less expensive to manufacture as well as offering at least twice the performance.
Although the McKinley was able to take advantage of Itanium’s shortcomings to increase effective parallelism, reduce latency, and increase clock rates there should be no mistaking the difficulty in pushing IA64 further down the road of ILP exploitation. The McKinley may very well lie close to the “sweet spot” for IA64. The complexity and physical design difficulties required to implement a four bundle issue wide IA64 processor are matched only by the exponentially harder task for EPIC compiler designers to provide such a machine with a sufficiently parallel instruction stream. This suggests that some, if not all, third generation IA64 designs will focus on achieving greater performance through greater exploitation of thread level parallelism using techniques like chip level multiprocessing and hardware based multi-threading. That conclusion would explain Intel’s concerted effort to attract and retain as much of the former EV8 design team as possible.
Be the first to discuss this article!