Implications for the Future
This survey of modern server processors has potential implications for the microserver architectures that will be arriving in the next few years. To start, one common theme across the new server vendors is the adoption of newer and more efficient instruction sets. While ARMv7 has some cruft, the 64-bit v8 is quite clean and elegant and Tilera likely has a reasonable ISA as well. This confers several advantages in terms of power efficiency (which is quite challenging to quantify) and area (which is a bit easier). The nature of these advantages has been endlessly argued in the RISC vs. CISC debate of the 1980’s. Now that the dust has settled, a reasonable estimate is that a more sensible ISA is worth around 5-15% all things being equal.
However, this advantage only applies to the CPU cores. The caches and system are almost entirely instruction set agnostic, and the RISC advantage has hardly been decisive for IBM or Oracle. The implication is clear: ARM-based microservers will only have a slender edge for general purpose workloads, where CPU cores seem to be about a third of the area for an optimal design. Generously assuming a 15% edge for ARMv8 or Tilera, this suggests a theoretical advantage of perhaps 5% at the chip level. Realistically, this pales in comparison to the impact of high volumes, experienced design teams, and manufacturing. Worse still, Intel is typically a node ahead of the rest of the industry, which translates into a 20-30% advantage.
Specialization is Key
History suggests that anything less than a 4× advantage simply isn’t big enough for customers to endure disruptive changes and deal with risky new vendors, although some estimates indicate that at least a 10× advantage is necessary. The obvious conclusion is that microserver vendors cannot target the general server market, because the resulting performance would not be good enough to motivate customers to switch. Put more simply, an ARM-based version of Niagara is unlikely to fare significantly better than the original. Instead the architects must seek an advantage by tailoring server processors to specific workloads and forgoing customers and applications outside that space. At a high level, there are two ways to specialize a server processor.
At the macro level, one potential path is to adjust the relationship between cores, cache and system. For instance, Sandy Bridge-EP has 2.5MB LLC, 6.4B/s memory, 10GB/s coherency and 10GB/s I/O bandwidth for each of the eight cores. Architects could decrease the cache to around 1MB/core and eliminate the coherent links to free up space. Theoretically, this will yield a more efficient design for workloads that don’t benefit from caching, while sacrificing efficiency on cacheable applications.
At the micro level, the CPU cores can be optimized for specific workloads as well. The most famous example was the original Niagara, which was a very simple scalar core, rather than the complex, multi-issue, out-of-order designs which AMD, Intel, and IBM favor. Again, the theory is that for workloads where complex cores are mostly stalled, a simple core is more area and power efficient for a given level of performance. Dedicated hardware accelerators are an excellent option for optimization, since the power and area benefits can be quite significant. These accelerators can be integrated into the CPU core, shared between cores, or shared across the entire die, as appropriate. The most common accelerators today are vector instructions, cryptography, random number generators, compression/decompression, and packet handling, but there are many other possible options depending on the specific application.
In practice, each microserver vendor will have to pursue a combination of macro and micro optimizations focused on a particular workload, as the gains from either one alone are insufficient. Consider the rather extreme measure of eliminating the LLC entirely, and re-allocating the area to CPU cores and system. At best that increases the area available for cores and I/O by 50%. While nice, that is insufficient to spur adoption of substantially different server processors and not much larger than a single process node shrink. Similarly, Sun’s Niagara was differentiated primarily by the highly simplified CPU microarchitecture. However, it is unclear that Sun had success carving out a niche beyond the existing Solaris customer.
In contrast, a modern GPU can be thought of as a combination of several shifts that emphasize highly parallel computational workloads. First, the LLC is almost entirely eliminated and the bulk of the area is allocated to GPU cores. Second, the system only contains a high bandwidth GDDR5 memory interface (with tuning for scatter/gather) and a single PCI-E link. Third, the GPU cores are highly optimized with very simple scalar pipelines but large vector units. Collectively, this amounts to a significant performance advantage (around 5× in raw FLOP/s and 2-3× FLOPs/watt or FLOPs/mm2), enough to mitigate a process technology disadvantage and to make HPC customers look carefully.
Optimizing for a specific workload at all levels of the server processor is necessary, but not sufficient for success. Fundamentally, the more narrow the market niche, the greater potential for disruptive performance. However, it comes at a very real trade-off in terms of addressable market. For example, a microserver processor with dedicated video transcoding hardware might offer significant benefits for Youtube and other video media, but the area would be entirely wasted for nearly any other workload. Such a deeply specialized product is unlikely to garner many customers. In order to offer a competitive product, modern process technology is critical and the costs of developing a server processor and platform are quite high. Consequently, volume is king as John Mashey discussed in an earlier article on the economics of the VAX.
The total server market is around 20M processors per year, which are almost entirely produced by Intel. The challenge for alternative architectures is twofold. First, new entrants must choose an architecture that yields differentiated performance in a market segment that is large enough in revenue and volume. Second, the products must actually be adopted by a sustainable customer base and continue to deliver differentiated products on the roadmap. If the market (or customer adoption) is too small, the microserver vendor will be unable to sustain the investment needed to stay ahead of the competition. This is particularly challenging, as there are very few customers (perhaps around 20) that operate on a scale where microservers make sense.
There are over half a dozen companies pursuing alternative server architectures, not to mention Intel’s incumbent efforts which benefit from more advanced manufacturing. Realistically, there might be room for 1-2 new entrants to succeed; the rest will never reach critical mass and will eventually be acquired or fade away. The technical choices made by these new entrants will largely determine where each offering falls on the volume vs. disruptive performance curve and in turn the likelihood of long term success.
Discuss (345 comments)