Technical Problems
Studying the history of Apple’s hardware choices and their approach to switching platforms helps to understand why an x86 to ARM migration is exceptionally unlikely. From a technical perspective, the performance and compatibility barriers are huge. Most of these technical problems are equally applicable if Apple were to design their own ARM microprocessor, or if they were to work with a partner.
The most obvious problem with such a switch is performance, pure and simple. While the MacBook Air has survived with mediocre performance, it is still using a fairly fast microprocessor. The MacBook Pro is intended for performance hungry professional applications. The current MBP uses one of the fastest microprocessors around, a quad-core Nehalem that runs up to 2.3GHz and is specifically meant to crunch through software like iMovie, Premiere or Photoshop. The graphics card is similarly intended for performance, with an AMD Radeon HD 6750M for high-end models. Performance was one of the key motivators for the PowerPC to x86 switch. The reality is that for Apple, performance matters.
High performance is simply something that ARM cannot deliver in the next couple of years. Intel’s Sandy Bridge and AMD’s Bulldozer microarchitectures set the bar for client systems today. Both are 64-bit, four wide, out-of-order cores, with multiple load/store units, 256-bit AVX vectors, with slightly different styles of multi-threading. For high-end client systems, 4 of these cores share 8-16MB L3 caches and a 128-bit wide DDR3 memory interface. They are manufactured on leading edge 32nm process technology and can reach 3-4GHz.
ARM is a RISC instruction set and simpler than x86, but it is still fairly complex with several implementation challenges. There are a number of different instruction modes with slightly different lengths (e.g. classic ARM, Thumb, VFP, Neon) which make decoding non-trivial. All instructions are predicated and some set flags (negative, zero, carry, overflow), which complicates register renaming and out-of-order execution. ARM also has an implicit barrel shifter on almost every ALU operation, which requires very expensive hardware. In fact, several of these features were removed from Thumb (e.g. implicit barrel shifter, predication).
Designing high performance microprocessors is not easy, nor for the faint of heart. The leaders in this field – AMD, IBM and Intel – have design teams that have worked together for decades and learned much from their hard earned experience. There are no ARM or Apple design teams with resources that are comparable to any of these three. Apple acquired PA Semi and Intrinsity and Nvidia acquired Stexar and many ex-Transmeta folks, but that does not put them in the same league in terms of resources or experience. It is quite easy to design a microprocessor core that looks good on paper, but in reality falls short – AMD’s Barcelona, Intel’s Pentium 4 and IBM’s POWER6 are just a few examples. Moreover, a modern processor relies on more than the core pipeline and is nearly a complete system-on-a-chip (SOC). The GPU, cache hierarchy, memory controllers and system interfaces are equally critical to performance. Designing these components and integrating them all together is incredibly difficult. Failed projects like the original Itanium, or Larrabee demonstrate the challenges, even for an incumbent like Intel.
There are no ARM microarchitectures that are comparable to Intel’s Sandy Bridge or AMD’s Bulldozer. Currently shipping ARM designs are at roughly 1GHz – where x86 was in 2000. ARM has designs on their roadmap that get much closer to x86; the A15 is a 3-wide, out-of-order design that should run at up to 2.5GHz. Presumably, their next core will come closer still. However, ARM’s ecosystem has relatively little experience with high performance system architecture and dealing with more complex caches, graphics and memory controllers. They will certainly learn as they go along, but it is not an overnight transformation and more of a gradual process. In short, it is quite possible that ARM and partners will catch up with x86 over the coming decade; but not in the next 2-3 years.
The current generation of ARM microprocessors are more power and area efficient than x86, in part due to lower performance. There is no reason to believe that these efficiency advantages will scale for high performance designs. The other components of a complete high performance SOC (e.g. GPU, caches, memory controllers) are mostly unrelated to the instruction set. The performance, area and power efficiency for these parts of the SOC will be similar for ARM, MIPS, PowerPC or x86. This will substantially reduce, but not totally eliminate, ARM’s instruction set advantage over x86.
The last critical point about performance and efficiency is that migrating to ARM would not just require matching, but exceeding x86 in performance. Every Apple transition has gone smoothly thanks to excellent backwards compatibility through emulation or binary translation. Emulating x86 on ARM is eminently feasible, but there is a performance tax. An ARM microprocessor would need to run faster and more efficiently than current and future x86 designs to avoid losing performance and power efficiency for generic software. Moreover, some of the x86 extensions such as AVX, SSE4.2 and AES-NI are exceptionally difficult to emulate and would bring performance to a crawl.
Last, Intel and Apple recently announced Thunderbolt and Lightpeak for future generation electrical and optical I/O. Apple probably intends to eventually consolidate and replace multiple I/O interfaces (e.g. USB, Firewire, DisplayPort) with a single Thunderbolt port. Yet, Intel owns some of the core intellectual property for Thunderbolt and Lightpeak. Intel has little motivation to license the patents to ARM, Apple or other companies designing ARM cores for PCs. Moreover, it seems unlikely that Apple would have consented to such an arrangement if they were planning to abandon x86 in the near future.
Discuss (95 comments)