By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), August 13, 2014 4:18 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 13, 2014 12:18 pm wrote:
[snip]
> If people want to compare ARM and x86 performance, then it seems like the highest bin is appropriate.
That depends on the desired nature of the comparison. It is not entirely clear that everyone in this thread is agreed on what is being compared.
If one wants to only include architectural factors (which is practically impossible to do but an interesting comparison from a certain abstract viewpoint), then including binning would not be appropriate. Ideally for such a comparison one would want to correct for process differences, design team size/experience/skill, and even Intel's integrated design and manufacturing advantage.
(From a semi-practical viewpoint, I would exclude the learning curve for validation and special x86 tricks. I.e., I would give x86 the advantage of assuming the organization has tools and experience in handling x86's quirks. The special x86 tricks may be broadly known by now, especially since they can have some application to cleaner ISAs, but validation tools probably take significantly longer to develop and are less easily transferred.)
> I agree that ARM can be commercially relevant in servers at lower performance points,
> but the OP was making noise about how ARM vs. x86 impacts design complexity/perf.
Even if one wanted to estimate whether the difference in Intel's resources more than compensated for any extra ISA-related complexity, it is not obvious that the highest bins are appropriate for comparison, at least in isolation. If price is no object—which is the weak implication from saying that top bins should be compared—, liquid nitrogen cooling could be used to substantially boost performance. If one compares a $200 ARM processor with a $300 cooling system to a $500 x86 with the default cooling system, it is not obvious that that would be a fair comparison even though the acquisition costs would be identical.
Since Intel's volume allows it to have more SKUs, it may be able to grab a 5% performance advantage from binning that will not represent performance comparison for most customers. Yes, there is a halo effect and the attraction of being able to get that extra 5% for a price if it is critical. However, both of these effects do not communicate anything about design complexity and its impact on performance and communicate relatively little about the benefit of Intel's resources.
Many of those interested in performance at (almost) any cost are either more likely to be stuck with a specific ISA or at least a specific vendor's software (which might not completely exclude ARM migration but works against it) or more likely to be willing to try exotic solutions (in which case extreme overclocking may be a reasonable alternative to highest bin Xeon). The former are less interesting customers since they do not have much choice. The former make comparison difficult because the diversity of system-level design choices makes a reasonable evaluation less practical.
It also seems that a lot of Intel's advantages depend on Intel (roughly) retaining its relative advantage in the effect of better process technology. This seems to be very difficult to predict. The increasing advantage of high volume at smaller feature sizes gives Intel an advantage. The plausible slowing of shrinking (or "effective shrinking") could reduce the disadvantage of foundry users since being two years behind Intel would represent a shrinking advantage in the benefit of their process technology lead. (That assumes that Intel's time-measured lead does not grow; it is quite plausible that this lead could grow.) Add in disruptive technologies and predicting even five years from now seems challenging.
As a processor company depending on system performance, commodity memory represents the "serial portion" in an application of Amdahl's Law. Intel can develop clever prefetching, writeback scheduling, read-for-ownership avoidance, clever address allocation (potentially using an invisible translation layer to allow finer-grained use of different methods), and various memory controller optimizations; but unless Intel pushes for better memory (and the Rambus experience may still weigh heavily on the minds of Intel executives) commodity memory seems likely to represent at least some equality of constraint (and so relatively greater constraint) on performance.
When improvement curves are superlinear, those at the high end of the curve are at a relative disadvantage. Using increasing or at least higher volume/profit (further enabled by being higher on the curve) to power up the curve faster despite the extra difficulty does work, but it does not seem like a sustainable strategy.
[snip]
> If people want to compare ARM and x86 performance, then it seems like the highest bin is appropriate.
That depends on the desired nature of the comparison. It is not entirely clear that everyone in this thread is agreed on what is being compared.
If one wants to only include architectural factors (which is practically impossible to do but an interesting comparison from a certain abstract viewpoint), then including binning would not be appropriate. Ideally for such a comparison one would want to correct for process differences, design team size/experience/skill, and even Intel's integrated design and manufacturing advantage.
(From a semi-practical viewpoint, I would exclude the learning curve for validation and special x86 tricks. I.e., I would give x86 the advantage of assuming the organization has tools and experience in handling x86's quirks. The special x86 tricks may be broadly known by now, especially since they can have some application to cleaner ISAs, but validation tools probably take significantly longer to develop and are less easily transferred.)
> I agree that ARM can be commercially relevant in servers at lower performance points,
> but the OP was making noise about how ARM vs. x86 impacts design complexity/perf.
Even if one wanted to estimate whether the difference in Intel's resources more than compensated for any extra ISA-related complexity, it is not obvious that the highest bins are appropriate for comparison, at least in isolation. If price is no object—which is the weak implication from saying that top bins should be compared—, liquid nitrogen cooling could be used to substantially boost performance. If one compares a $200 ARM processor with a $300 cooling system to a $500 x86 with the default cooling system, it is not obvious that that would be a fair comparison even though the acquisition costs would be identical.
Since Intel's volume allows it to have more SKUs, it may be able to grab a 5% performance advantage from binning that will not represent performance comparison for most customers. Yes, there is a halo effect and the attraction of being able to get that extra 5% for a price if it is critical. However, both of these effects do not communicate anything about design complexity and its impact on performance and communicate relatively little about the benefit of Intel's resources.
Many of those interested in performance at (almost) any cost are either more likely to be stuck with a specific ISA or at least a specific vendor's software (which might not completely exclude ARM migration but works against it) or more likely to be willing to try exotic solutions (in which case extreme overclocking may be a reasonable alternative to highest bin Xeon). The former are less interesting customers since they do not have much choice. The former make comparison difficult because the diversity of system-level design choices makes a reasonable evaluation less practical.
It also seems that a lot of Intel's advantages depend on Intel (roughly) retaining its relative advantage in the effect of better process technology. This seems to be very difficult to predict. The increasing advantage of high volume at smaller feature sizes gives Intel an advantage. The plausible slowing of shrinking (or "effective shrinking") could reduce the disadvantage of foundry users since being two years behind Intel would represent a shrinking advantage in the benefit of their process technology lead. (That assumes that Intel's time-measured lead does not grow; it is quite plausible that this lead could grow.) Add in disruptive technologies and predicting even five years from now seems challenging.
As a processor company depending on system performance, commodity memory represents the "serial portion" in an application of Amdahl's Law. Intel can develop clever prefetching, writeback scheduling, read-for-ownership avoidance, clever address allocation (potentially using an invisible translation layer to allow finer-grained use of different methods), and various memory controller optimizations; but unless Intel pushes for better memory (and the Rambus experience may still weigh heavily on the minds of Intel executives) commodity memory seems likely to represent at least some equality of constraint (and so relatively greater constraint) on performance.
When improvement curves are superlinear, those at the high end of the curve are at a relative disadvantage. Using increasing or at least higher volume/profit (further enabled by being higher on the curve) to power up the curve faster despite the extra difficulty does work, but it does not seem like a sustainable strategy.