By: anon (anon.delete@this.anon.com), August 9, 2014 12:12 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 8, 2014 11:36 pm wrote:
> > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > >
> > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > >
> > > It's also not really disputed that at the very small scale, x86
> > > designs can't compete with simple ARM based microarchitectures.
> >
> > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> >
> > Jaguar is considered a good x86 design and even competitive against
> > Intel last designs. Thus we are seeing the x86 tax in action.
>
> No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> uncore. It's no surprise that the Jaguar-based design is behind.
>
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length. Also the x86 ISA is full
> > of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> 3-4 different decoders. I know a few people who have had the pleasure of designing
> custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
ARMv8 does not require legacy modes, unless ARM was lying or being misleading. Maybe it has to be negotiated in licensing, but they certainly indicated that v8-only cores would be an option.
In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide, while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
>
> There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> core (obviously things are much worse for scalar cores).
>
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Those numbers are suspect and also probably not comparing the right things. Much
> of the cost of a server design is in the cache, coherent interconnects, memory controller,
> power management, etc. which is necessary for any design, ARM or x86.
>
> David
> > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > >
> > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > >
> > > It's also not really disputed that at the very small scale, x86
> > > designs can't compete with simple ARM based microarchitectures.
> >
> > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> >
> > Jaguar is considered a good x86 design and even competitive against
> > Intel last designs. Thus we are seeing the x86 tax in action.
>
> No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> uncore. It's no surprise that the Jaguar-based design is behind.
>
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length. Also the x86 ISA is full
> > of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> 3-4 different decoders. I know a few people who have had the pleasure of designing
> custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
ARMv8 does not require legacy modes, unless ARM was lying or being misleading. Maybe it has to be negotiated in licensing, but they certainly indicated that v8-only cores would be an option.
In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide, while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
>
> There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> core (obviously things are much worse for scalar cores).
>
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Those numbers are suspect and also probably not comparing the right things. Much
> of the cost of a server design is in the cache, coherent interconnects, memory controller,
> power management, etc. which is necessary for any design, ARM or x86.
>
> David