By: juanrga (nospam.delete@this.juanrga.com), August 9, 2014 6:38 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 8, 2014 11:36 pm wrote:
> > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > >
> > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > >
> > > It's also not really disputed that at the very small scale, x86
> > > designs can't compete with simple ARM based microarchitectures.
> >
> > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> >
> > Jaguar is considered a good x86 design and even competitive against
> > Intel last designs. Thus we are seeing the x86 tax in action.
>
> No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> uncore. It's no surprise that the Jaguar-based design is behind.
This is a non-issue. The same will happen when you compare desktops, laptops, or tablets using A57 or Jaguar. The ARM core will be faster and efficient than the x86 core, despite the latter is a very good design in x86 space and the former is only the first standard 64bit core. When custom ARM is compared to jaguar then things look even poor for x86. Check Anand review of cyclone, for instance.
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length. Also the x86 ISA is full
> > of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> 3-4 different decoders. I know a few people who have had the pleasure of designing
> custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
By ARM64 I am referring to AArch64 exclusively. ARMv8 can be A or T and it includes AArch32 for legacy.
The designs that I am commenting are pure AArch64 implementations, legacy 32bit mode is not needed for HPC for instance.
> There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> core (obviously things are much worse for scalar cores).
I have info from Intel that says otherwise and I know that he is painting in rose. According to him legacy support already accounts for one-third of the energy of integer execution. This doesn't include fetch-decode energy, which sums up to about 2/3. Moreover his numbers perpetuate the myth that x86 tax in only in the decode: part of the energy associated to execution has a penalty due to the ISA.
Ah and this is for an OoO core, things look poor for simpler cores.
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Those numbers are suspect and also probably not comparing the right things. Much
> of the cost of a server design is in the cache, coherent interconnects, memory controller,
> power management, etc. which is necessary for any design, ARM or x86.
Apparently you have not heard of AMD AMBIDEXTROUS strategy. Only the core changes, the rest of the chip is the same up to the pin level.
Above numbers are credible. They are the reason why so many companies are doing competitive server/HPC designs. They are the reason that K12 core comes first and the zen core comes latter.
> > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > >
> > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > >
> > > It's also not really disputed that at the very small scale, x86
> > > designs can't compete with simple ARM based microarchitectures.
> >
> > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> >
> > Jaguar is considered a good x86 design and even competitive against
> > Intel last designs. Thus we are seeing the x86 tax in action.
>
> No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> uncore. It's no surprise that the Jaguar-based design is behind.
This is a non-issue. The same will happen when you compare desktops, laptops, or tablets using A57 or Jaguar. The ARM core will be faster and efficient than the x86 core, despite the latter is a very good design in x86 space and the former is only the first standard 64bit core. When custom ARM is compared to jaguar then things look even poor for x86. Check Anand review of cyclone, for instance.
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length. Also the x86 ISA is full
> > of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> 3-4 different decoders. I know a few people who have had the pleasure of designing
> custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
By ARM64 I am referring to AArch64 exclusively. ARMv8 can be A or T and it includes AArch32 for legacy.
The designs that I am commenting are pure AArch64 implementations, legacy 32bit mode is not needed for HPC for instance.
> There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> core (obviously things are much worse for scalar cores).
I have info from Intel that says otherwise and I know that he is painting in rose. According to him legacy support already accounts for one-third of the energy of integer execution. This doesn't include fetch-decode energy, which sums up to about 2/3. Moreover his numbers perpetuate the myth that x86 tax in only in the decode: part of the energy associated to execution has a penalty due to the ISA.
Ah and this is for an OoO core, things look poor for simpler cores.
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Those numbers are suspect and also probably not comparing the right things. Much
> of the cost of a server design is in the cache, coherent interconnects, memory controller,
> power management, etc. which is necessary for any design, ARM or x86.
Apparently you have not heard of AMD AMBIDEXTROUS strategy. Only the core changes, the rest of the chip is the same up to the pin level.
Above numbers are credible. They are the reason why so many companies are doing competitive server/HPC designs. They are the reason that K12 core comes first and the zen core comes latter.