Poor Direction, Predictable Ending, but Great Special FX
In 1995, the late great Digital Equipment Corporation disclosed it had a very ambitious dream: to displace Intel’s x86 processors as the platform of choice for running Windows NT, Microsoft’s newest operating system and supposed successor to Unix . It had two significant weapons in its arsenal. The first was the Alpha processor family, the final and technically strongest player to emerge in the general purpose RISC processor arena. The second was a combination emulator/binary translator called FX!32 that would in theory allow users transparent access to the vast expanse of x86 software while using Alpha based systems running the Alpha coded version of Windows NT .
Both factors would be critical to DEC’s success. Although the fastest processors in the Alpha family incorporated into expensive server class hardware easily outdistanced Intel’s fastest x86 processors, things got murky in the low end where DEC wanted to play. To come down to the incredibly parsimonious pricing structure of x86 hardware while not destabilizing its bread and butter server line, DEC had to introduce a cut down version of its Alpha processor, the 21164PC, and an accompanying bare bones chipset. This scaled down hardware greatly reduced Alpha’s performance lead over the coming new generation of P6 core based x86 processors from Intel, while still leaving DEC’s cost structure higher because of its much lower manufacturing volumes and limited distribution channels.
The final stumbling block for DEC was a lack of native Alpha applications running under Win NT and that is where FX!32 came in. It allowed x86 programs to be executed on this streamlined RISC architecture which possessed no accommodation for any of the arcane eccentricities of the older CISC design. FX!32 is a software system comprised of seven major components: 1) transparency agent, 2) runtime, 3) emulator, 4) translator, 5) database, 6) server, and 7) manager. The relationship between the FX!32 components and their shared resources and data components is shown in Figure 1.
Figure 1 – FX!32 Organization
When FX!32 is installed on an Alpha system running Win NT 4.0 the transparency agent is hooked to the OS API routine CreateProcess. Every time a new program is initiated, the transparency agent is invoked. It examines the code image to determine if the program contains x86 or Alpha code. If it is x86 code, the transparency agent invokes the FX!32 runtime component to execute it. The runtime in effect takes the place of the normal Windows NT loader which can only handle Alpha images. The runtime loads the x86 image into memory, sets up the run-time environment required by the emulator and then transfers control to the FX!32 emulator. Since the FX!32 runtime replaces the normal NT loader it is also responsible for relocating x86 images, setting up shared sections, and processing static thread local storage sections.
The runtime also performs the function of binding an x86 image’s symbolic imports. Imports that refer to entries in Alpha code are treated specially by redirecting them through a special “jacket” layer implemented as an FX!32 specific dynamically linked library (DLL). The jacket routines in “jacket.dll” provide the ability for x86 programs executed with FX!32 to call native Alpha routines in the Win32 API. This is an important element in the performance of FX!32 because some x86 applications ran faster on the Alpha platform than an x86 platform, even under pure emulation, because of the large fraction of time spent in native Alpha DLLs.
For an x86 application to run under FX!32 every image it in turn loads must be either an x86 image or a jacketed Alpha image. This imposed the burden that FX!32 releases had to provide over 50 appropriately jacketed native DLLs to run a cross section of popular commercial x86 applications. Each jacket routine contains an illegal x86 instructions as a flag for the FX!32 x86 emulator to setup a transfer to native Alpha code. The basic function of jacket routines was to move arguments from the x86 stack to the appropriate Alpha registers to conform to the native Alpha calling convention. To achieve full transparency FX!32 needs to modify various internal data structures maintained by the Windows NT operating system. Unfortunately these data structures were not part of the official Win32 interface. So while FX!32 did not require modifications to the OS, it has dependencies on a number of undocumented features of NT. That created a situation where a particular version of FX!32 was tied to a particular release of Windows NT.
The FX!32 emulator is invoked the first time an x86 program is run. It is a classic fetch and evaluate interpreter that uses the first 16 bits of an x86 instruction as an entry in a 64 KB lookup table. The lookup table contains the address of an Alpha coded routine to call to interpret the given x86 instruction as well as the instruction’s length. The emulator was carefully crafted so that the code needed to execute the most commonly encountered x86 instructions would fit in the 8 KB instruction cache of the 21164 and 21164PC. The emulator also performs “lazy evaluation” of condition codes, setting them, based on information collected from prior emulation of ALU type instructions, only when needed as input to the current x86 instruction. Despite considerable effort expended to maximize emulator efficiency, it still took on average 45 Alpha instructions to interpret one x86 instruction . The FX!32’s designers were able to take minor liberties in the emulation of x86’s FP instructions. Because Windows NT used a 64 bit floating point model, x86’s x87 80 bit extended precision functionality was not replicated. This avoided a tremendous performance hit as the Alpha, like every other major RISC architecture, doesn’t support operations on FP data formats larger than 64 bits.
The FX!32’s emulator also collects profile data during execution. This data is collected for the benefit of the FX!32 translator component and includes target address of call instructions, source and target addresses for indirect jumps, and addresses of x86 instructions that make unaligned memory transfers. When an x86 program is executed for the first time it is entirely accomplished through the emulator. When the program exits, the profile data collected by the emulator is saved in a file managed by the FX!32 server component. In turn the server invokes the translator in the background to create segments of native Alpha code that duplicate the functionality of sections of x86 code previously executed under emulation. The next time the x86 program is invoked the emulator takes note of translated Alpha code segments, load them from disk as a DLL, and transfer control to them for direct (and much faster) execution. Note that this is an iterative process. Each time an x86 program is run the user may invoke new elements of program functionality for the first time which requires emulation. This in turn identifies new x86 code areas for the FX!32 translator to go to work on. Over time an x86 program run under FX!32 will tend to have an increasingly comprehensive base of native Alpha code built up for it and thus become faster and faster, up to 10 times faster than pure emulation. The FX!32 manager component provides a GUI user interface to the package and allows limits on disk utilization, translator run policy, etc. to be set by the user.
Discuss (18 comments)