The proponents and popularizers of the “RISC and CISC have converged” school of thought are so caught up in comparing chip organization and micro-architecture that they miss the big picture. The benefit of RISC ISA-based processor design comes in two separate packages. They focus on the first package: the ease of design of simplified and fast hardware. The era of 10 and 15 million transistor chips with three and four way superscalar issue and out-of-order execution has somewhat reduced (but not eliminated) this benefit, because in a sense all these chips, RISC and CISC alike, are damn complicated!
But the second benefit of RISC is the computational model – the ISA – it offers to the compiler. A RISC ISA offers a streamlined and simplified instruction set, and a generous set of general purpose registers. Most RISC designs do away with condition codes and instead rely on either storing Boolean control information in general purpose registers, atomically combining comparison and branch operations in single instructions, or a combination of both. In Figure 2 are the programmer’s visible register resources of the x86 and Alpha ISAs. The bottom line is that the x86 has 8 general purpose integer registers, while RISC processors have 32. Ironically, both modern x86 and RISC processors have even far more physical data registers in them than shown here to accomplish register renaming, a powerful design tool used to eliminate the effect of false dependencies between instructions that would otherwise prevent out-of-order execution. However, it is the computational model seen by the compiler that is critical for the generation of ultra fast code.
The modern compiler is, in many ways, as complex and fascinating as the processors it creates code for. But the vital ingredient that allows a sophisticated compiler to excel is a large and unencumbered register set. A large register set facilitates such powerful optimization techniques as local and global variable register assignment, register-based parameter passing and function result return, and re-use of intermediate computational results from the calculation of common sub-expressions. In addition it is well known that because of loops, roughly 90% of program time is spent executing 10% of code. RISC ISAs, with their large register sets, support powerful loop-based optimizations such as array index address calculation strength reduction, software pipelining, and loop unrolling. Besides the large register sets, most RISC ISAs also incorporate three address instructions, that is, instructions that specify three registers – two source and one destination. The x86 and nearly every other CISC ISA use only one or two address instructions which means that extra move instructions are needed when it necessary not to overwrite either of two operand registers.
The power and efficiency of the large register sets and three address instructions found in RISC ISAs is most clearly demonstrated when it comes to floating point performance. The x86 ISA uses an antiquated 8 element register stack computational model for its floating point instructions, which is quite inferior to the familiar RISC ISA model with large floating point register sets and three address floating point instructions. Because of these architectural differences most high-end RISC processor families outperform even the best x86 processors by a factor of 2x, 3x, or more on floating point benchmark programs despite both having fully pipelined floating point execution units with similar latencies. This effect is so great that Advanced Micro Devices (AMD) has announced that it is adding a RISC style floating point computational model to its x86-64 architecture, a 64 bit extension of the existing x86 ISA to help close the performance gap with RISC processors.
The importance of architectural differences between RISC and CISC ISAs to compiled code performance is demonstrated in Figure 3. A simple subroutine was written in the C programming language to find the element in a list of integers with the most number of bits set. Along with the source code is the assembly language code equivalent generated by compilers for the HP PA-RISC processor and Intel x86. Both the PA-RISC and x86 compilers were run with maximum optimization for code speed.
Note: in terms of relative code density this program is not representative; it is generally accepted that most RISC processors have program code sizes at least 30% higher than x86 on average).
The RISC code benefits from register-based parameter passing to minimize memory traffic and superfluous instructions associated with stack based argument passing. The PA-RISC subroutine includes 23 instructions that total 92 bytes in size. The x86 subroutine uses 41 instructions that total 97 bytes in size. For a modern x86 processor whose instruction cache incorporated three bits of pre-decode and hint information per program byte this subroutine would occupy 45% more on-chip SRAM bits than the PA-RISC program would in a PA-8500.
Be the first to discuss this article!