Wolves in CISC Clothing

Pages: 1 2 3 4 5 6 7 8 9

Not Quite EPIC

The Crusoe VLIW processor has 160 physical registers which are divided into 64 general purpose registers (GPRs), 48 GPR “shadow” registers, 32 floating point registers (FPRs), and 16 FPR “shadow” registers. The GPRs are 32 bits wide while the FPRs are 80 bits wide to support the extended precision FP data format characteristic of the x86’s “x87” FP execution model. The presence of the shadow registers along with special alias detection logic allows the CMS to generate efficient VLIW code that can function in a speculative and/or out-of-order fashion to duplicate the functionality of x86 code segments without giving up the ability to mimic x86 precise exception handling. Instead of being constrained to keep the simulated x86 processor state consistent with x86 execution semantics at every x86 instruction boundary, CMS will generate a group of N VLIW instructions that correspond to the semantics of a group of M x86 instructions in totality but not individually. Within those N instructions, new values can be calculated speculatively and stored in the working set GPRs and FPRs. The CMS marks the end of this group with a special “commit” flag in the VLIW code which causes the contents of working set GPRs and FPRs to be copied into corresponding shadow registers. At such points the state of the VLIW computation is considered coherent with respect to the simulated x86 state.

If an exception happens during the execution of VLIW code in between commit points, then special code within the CMS rolls back the contents of some GPRs and FPRs using the contents of their corresponding shadow registers to the previous VLIW/x86 coherence point. CMS then marches x86 execution forward from that point using instruction by instruction emulation until the original exception occurs again [11]. At this point it can be accurately reported with respect to the specific x86 instruction that caused it and the precise exception handling model of the x86 ISA can be accurately mimicked. To achieve the same roll back capability for memory operations, the Crusoe has the capability to collect and defer memory store operations until the CMS-generated VLIW code is ready to commit all the deferred stores to memory at which point they are all released. If an exception occurs prior to committal, then the delayed store operations are simply discarded. The Crusoe also incorporates a hardware feature in the MMU to detect self-modifying code. In such a case an interrupt is signaled to the CMS to signal it to invalidate the appropriate section of translated VLIW code corresponding to the modified x86 code and force itself to emulate or translate the new x86 code bytes over again if and when it becomes necessary by x86 program flow into the modified code.

The first generation Crusoe core is a 4 issue wide VLIW whose instruction set includes primitive instructions that are designed to accelerate the execution of x86, x87, and MMX operation semantics. Transmeta’s recently disclosed second generation Efficeon core is an 8 issue wide VLIW with a new instruction set that in addition to Crusoe’s capabilities, adds support for the newer SSE and SSE2 extensions to x86. The instruction issue capability/execution resources of the Crusoe and Efficeon cores are shown in Figure 4.

CISC-Wolves-fig4.gif - 15061 Bytes
Figure 4 – Comparison of Crusoe and Efficeon

Efficeon’s new native instruction set is the result of careful analysis of the performance characteristics of the first generation Crusoe processor and CMS measured while running popular x86 PC applications and operating systems. One quite unusual feature added to Efficeon is an execute instruction that allows a native VLIW instruction to be constructed within the processor and executed on the fly. This capability was added to improve the average performance of the CMS interpreter [12].

Pages: « Prev   1 2 3 4 5 6 7 8 9   Next »

Discuss (18 comments)