Alpha’s Omega: The Fearsome Spider that got the Boot
If one tried to discern a single underlying design strategy behind every Alpha processor development effort over the past dozen years, it might be summarized as follows: when faced with a choice between expediency and performance, always choose performance. The former designers of the EV8 (code named Arana, Spanish for spider) clearly adhered to this principle with almost religious devotion. They planned to build a processor whose microarchitecture went much further than anything attempted before in exploiting both instruction level parallelism and thread level parallelism to achieve absolute uniprocessor performance leadership. Add to that the glueless system interface features pioneered in the EV7, and hundreds of EV8s could be readily incorporated into large scale systems of nearly unimaginable processing capabilities. The basic organization of the Alpha EV8 is show in Figure 1.
Figure 1 Block Diagram of the Alpha EV8
The EV8 is an 8 instruction issue wide out-of-order execution superscalar RISC processor that also supports 4 way simultaneous multithreading (SMT). The front end of the EV8 is a 64 KB, 2-way set associative pseudo dual ported instruction cache that supports the fetch of two separate 8 instruction cache lines each cycle. The EV8 branch predictor can predict up to two branches per cycle and the indices of the two instruction cache line fetched may be either sequential or disjoint depending on its decision. The two groups of 8 instructions are buffered (per thread) and merged in the “collapser”, a functional block that chooses the 8 out of the 16 fetched instructions on the most likely path of execution, as predicted by the branch logic. These 8 instructions are register renamed and entered in a 128 entry out-of-order instruction issue queue. Each cycle, the queue selects up to eight data-ready entries for dispatch to the 8 integer units, 4 floating point units, 2 load units, and 2 store units that comprise the EV8’s execution resources . As is customary, the issue queue selects the data-ready instructions that have been in the queue the longest. Unlike the EV6 instruction queues, the EV8’s design does not attempt to maintain program order of entries by shifting down instructions to fill vacant entries. Instead instructions are given an age vector when they enter the queue. The age vectors can be combined with a common “bid” vector using simple and fast logic to select the oldest data-ready instructions for issue. This new scheme allows out-of-order issue queues to be built with many more entries than previously possible, yet be capable of faster operation.
Unlike the EV6, the EV8 uses both a common instruction issue queue, and a common physical register file for both integer and FP operations. The EV8 supports 4 SMT thread contexts, each of which have 32 integer and 32 FP registers, for a combined total of 256 architected state general-purpose registers. In addition, the unified register file also contains 256 renaming registers, one for each of 256 possible instructions in flight, for a grand total of 512 registers. The EV8 needs 16 logical read ports and 8 logical write ports in its register file to support the sustained execution of eight instructions per cycle. A 24 port, 512 entry register file is not a nice thing to have sitting on a critical path so the EV8 designers resorted to the time honored solution of replication. The EV8 unified logical register file physically consists of two separate 512 entry, 8 read port and 8 write port physical register files. The overall total of 1024 64-bit registers in the EV8 is equivalent to the entire 8 KB data cache in the first Alpha processor, the EV4 (21064).
Be the first to discuss this article!