By: Megol (golem960.delete@this.gmail.com), August 8, 2014 11:23 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on August 8, 2014 10:49 am wrote:
> anon (anon.delete@this.anon.com) on August 6, 2014 7:54 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on August 6, 2014 11:55 am wrote:
> > >
> > > I liked the part when they appeal to "independent research" and then mention the same crap paper
> > > that every Intel fanboy mentions. I still recall the first time that I did read the paper by
> > > Blem, Menon, and Sankaralingam. After my initial perplexity on how almost any decision they took
> > > (from hardware to compiler version) seemed orientated to favor x86 over ARM, I did search further
> > > info about the senior author and found that one of the coworkers of his research team is an Intel
> > > lab guy, that several students in his group are awarded Intel grants... LOL
> >
> > Yes, the answer is not a "strong" one at all.
> >
> > That paper was discussed here when it came out, and numerous issues were pointed out with
> > it. Not to mention that it absolutely does *not* show that ISA does not make a difference.
> > The most it really attempts to show is that when looking at devices ranging from A8 to i7,
> > microarchitecture perf/power/cost target is the first order effect. Which it is. What it does
> > not show is whether the ISA made, say, 10% difference when holding all else equal.
>
> They compared older hardware. Migrating from SB-i7 to HW-i7 introduces little benefits
> in performance (except when using new AVX2 extensions to x86) but in ARM each gen is
> not a mere 5-10% faster than former gen but much more. Their choice favored x86.
5-10% is a huge difference given that it results from slight polishing. I have to say this point is nonsense.
> They used only Intel designs. Using only AMD or a mixture of AMD and Intel
> would change both performance and efficiency. Their choice favored x86.
But would that be relevant? If Intel processors are the most efficient x86 ones available shouldn't that be a wise choice?
> They computed the power consumption incorrectly. Their methodology choice favored x86.
>
> They used an old compiler that didn't optimize the code for ARM all that would do. Their choice favored x86.
Most code in the wild isn't optimized much.
> And so on.
>
> > Then we have this
> >
> > "Intel's then-mobile chief Mike Bell clearly stated that the concept of an 'x86 tax' simply isn't true."
> >
> > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> >
> > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> >
> > It's also not really disputed that at the very small scale, x86
> > designs can't compete with simple ARM based microarchitectures.
>
> Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
>
> Jaguar is considered a good x86 design and even competitive against
> Intel last designs. Thus we are seeing the x86 tax in action.
>
> > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
>
> E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> instructions of variable length.
True. But the way to handle this is well known nowadays, one way is using massive parallel length decoding, another is to use predecode data and tag each byte. There have been arguments that the later technique can scale up to 8 instructions decoded/clock with most complexity being those things a RISC also need (tracking dependencies++).
>Also the x86 ISA is full of legacy instructions, which have to be implemented
> in hardware and then verified/tested which increases development costs and time of development.
Wrong. Legacy instructions need some hardware, true. But most of the functionality is implemented in microcode instead of adding complex hardware.
Now there are some quirks in the x86 ISA that does waste power like handling of shift by zero, calculating the auxilary flag (nibble carry) etc. But those are far from the most power consuming parts of an OoO processor core.
> According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> development costs required to build an x86-based server chip based on a new micro-architecture.
Now that's 100% true. X86 is a complex beast to implement and much of the complexities aren't really documented. But those undocumented things are used, knowingly or otherwise, and have to be supported.
> anon (anon.delete@this.anon.com) on August 6, 2014 7:54 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on August 6, 2014 11:55 am wrote:
> > >
> > > I liked the part when they appeal to "independent research" and then mention the same crap paper
> > > that every Intel fanboy mentions. I still recall the first time that I did read the paper by
> > > Blem, Menon, and Sankaralingam. After my initial perplexity on how almost any decision they took
> > > (from hardware to compiler version) seemed orientated to favor x86 over ARM, I did search further
> > > info about the senior author and found that one of the coworkers of his research team is an Intel
> > > lab guy, that several students in his group are awarded Intel grants... LOL
> >
> > Yes, the answer is not a "strong" one at all.
> >
> > That paper was discussed here when it came out, and numerous issues were pointed out with
> > it. Not to mention that it absolutely does *not* show that ISA does not make a difference.
> > The most it really attempts to show is that when looking at devices ranging from A8 to i7,
> > microarchitecture perf/power/cost target is the first order effect. Which it is. What it does
> > not show is whether the ISA made, say, 10% difference when holding all else equal.
>
> They compared older hardware. Migrating from SB-i7 to HW-i7 introduces little benefits
> in performance (except when using new AVX2 extensions to x86) but in ARM each gen is
> not a mere 5-10% faster than former gen but much more. Their choice favored x86.
5-10% is a huge difference given that it results from slight polishing. I have to say this point is nonsense.
> They used only Intel designs. Using only AMD or a mixture of AMD and Intel
> would change both performance and efficiency. Their choice favored x86.
But would that be relevant? If Intel processors are the most efficient x86 ones available shouldn't that be a wise choice?
> They computed the power consumption incorrectly. Their methodology choice favored x86.
>
> They used an old compiler that didn't optimize the code for ARM all that would do. Their choice favored x86.
Most code in the wild isn't optimized much.
> And so on.
>
> > Then we have this
> >
> > "Intel's then-mobile chief Mike Bell clearly stated that the concept of an 'x86 tax' simply isn't true."
> >
> > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> >
> > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> >
> > It's also not really disputed that at the very small scale, x86
> > designs can't compete with simple ARM based microarchitectures.
>
> Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
>
> Jaguar is considered a good x86 design and even competitive against
> Intel last designs. Thus we are seeing the x86 tax in action.
>
> > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
>
> E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> instructions of variable length.
True. But the way to handle this is well known nowadays, one way is using massive parallel length decoding, another is to use predecode data and tag each byte. There have been arguments that the later technique can scale up to 8 instructions decoded/clock with most complexity being those things a RISC also need (tracking dependencies++).
>Also the x86 ISA is full of legacy instructions, which have to be implemented
> in hardware and then verified/tested which increases development costs and time of development.
Wrong. Legacy instructions need some hardware, true. But most of the functionality is implemented in microcode instead of adding complex hardware.
Now there are some quirks in the x86 ISA that does waste power like handling of shift by zero, calculating the auxilary flag (nibble carry) etc. But those are far from the most power consuming parts of an OoO processor core.
> According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> development costs required to build an x86-based server chip based on a new micro-architecture.
Now that's 100% true. X86 is a complex beast to implement and much of the complexities aren't really documented. But those undocumented things are used, knowingly or otherwise, and have to be supported.