By: dmcq (dmcq.delete@this.fano.co.uk), August 9, 2014 3:00 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 9, 2014 12:12 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on August 8, 2014 11:36 pm wrote:
> > > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > > >
> > > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > > >
> > > > It's also not really disputed that at the very small scale, x86
> > > > designs can't compete with simple ARM based microarchitectures.
> > >
> > > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> > >
> > > Jaguar is considered a good x86 design and even competitive against
> > > Intel last designs. Thus we are seeing the x86 tax in action.
> >
> > No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> > servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> > uncore. It's no surprise that the Jaguar-based design is behind.
> >
> > > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> > >
> > > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > > instructions of variable length. Also the x86 ISA is full
> > > of legacy instructions, which have to be implemented
> > > in hardware and then verified/tested which increases development costs and time of development.
> >
> > ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> > 3-4 different decoders. I know a few people who have had the pleasure of designing
> > custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
>
> ARMv8 does not require legacy modes, unless ARM was lying or being misleading. Maybe it has to be
> negotiated in licensing, but they certainly indicated that v8-only cores would be an option.
>
> In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the
> fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide,
> while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find
> it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
>
The 32 bit ARM code is complete with Thumb is messy though I think the x86 ISA is far worse. Removal of the 32 bit mode is definitely allowed and I believe some of the cores being designed have already done so. Smartphones will have to support both modes for a while for compatibility reasons, I'll be interested to see if Apple have started demoting 32 bit mode to a second class citizen in the A8.
> >
> > There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> > core (obviously things are much worse for scalar cores).
I guess you're talking about speed. Well there seems to have been a significant tax of perhaps 15% in the ARM 32 bit mode compared to the 64 bit mode. And the x86 overheads includes all sorts of other things besides decode like the multiple floating point units, the need to track memory space in the register renaming because of the high use of the stack, needing to cope with storing code into the executable code area, the funny requirements for memory coherence from before people understood what was required plus the various accretions on that.
> >
> > > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > > development costs required to build an x86-based server chip based on a new micro-architecture.
> >
> > Those numbers are suspect and also probably not comparing the right things. Much
> > of the cost of a server design is in the cache, coherent interconnects, memory controller,
> > power management, etc. which is necessary for any design, ARM or x86.
> >
> > David
They both most definitely take a lot more time and money than that!
> David Kanter (dkanter.delete@this.realworldtech.com) on August 8, 2014 11:36 pm wrote:
> > > > Which is obviously something a marketing/executive person would say, but it's also completely false. The
> > > > *concept* of an x86 tax is absolutely true. And not just the concept, but even in implementation, we can
> > > > take a really simple example which is the instruction decoding complexity, and point to that.[*]
> > > >
> > > > There have also been engineers in the past acknowledge some inefficiencies and estimate
> > > > "x86 tax" for then-PC class designs. Whether those are still valid with x64 and ever more
> > > > complex CPUs is up for debate, but certainly the *concept* of an x86 tax is there.
> > > >
> > > > It's also not really disputed that at the very small scale, x86
> > > > designs can't compete with simple ARM based microarchitectures.
> > >
> > > Take a modern A57 core. According to AMD the A57 Opteron is faster than jaguar based Opteron but
> > > consumes less power. The ARM core performance is ~40% faster, and consumes roughly one half.
> > >
> > > Jaguar is considered a good x86 design and even competitive against
> > > Intel last designs. Thus we are seeing the x86 tax in action.
> >
> > No you aren't. Jaguar is a good core design, but the uncore was inappropriate for
> > servers. You are comparing a design with a server-specific uncore vs. one with a client-optimized
> > uncore. It's no surprise that the Jaguar-based design is behind.
> >
> > > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> > >
> > > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > > instructions of variable length. Also the x86 ISA is full
> > > of legacy instructions, which have to be implemented
> > > in hardware and then verified/tested which increases development costs and time of development.
> >
> > ARM is full of legacy crap as well. Not to mention the fact that an ARMv8 requires
> > 3-4 different decoders. I know a few people who have had the pleasure of designing
> > custom ARM cores, and according to them 'ARMv8 decode is just as terrible as x86'.
>
> ARMv8 does not require legacy modes, unless ARM was lying or being misleading. Maybe it has to be
> negotiated in licensing, but they certainly indicated that v8-only cores would be an option.
>
> In that case, there is exactly zero possibility that ARMv8 is "just as terrible". Also, the
> fact that 32-bit arm cores have gone to 3-wide decode, and (apparently) Apple's is 6 wide,
> while even with SMT, the Intel Atom was only 2-wide, and silvermont is only 2 wide, I find
> it hard to believe that even earlier ARMs were nearly so problematic as x86 for decoding.
>
The 32 bit ARM code is complete with Thumb is messy though I think the x86 ISA is far worse. Removal of the 32 bit mode is definitely allowed and I believe some of the cores being designed have already done so. Smartphones will have to support both modes for a while for compatibility reasons, I'll be interested to see if Apple have started demoting 32 bit mode to a second class citizen in the A8.
> >
> > There really isn't a significant x86 tax. Perhaps 5% for a reasonable
> > core (obviously things are much worse for scalar cores).
I guess you're talking about speed. Well there seems to have been a significant tax of perhaps 15% in the ARM 32 bit mode compared to the 64 bit mode. And the x86 overheads includes all sorts of other things besides decode like the multiple floating point units, the need to track memory space in the register renaming because of the high use of the stack, needing to cope with storing code into the executable code area, the funny requirements for memory coherence from before people understood what was required plus the various accretions on that.
> >
> > > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > > development costs required to build an x86-based server chip based on a new micro-architecture.
> >
> > Those numbers are suspect and also probably not comparing the right things. Much
> > of the cost of a server design is in the cache, coherent interconnects, memory controller,
> > power management, etc. which is necessary for any design, ARM or x86.
> >
> > David
They both most definitely take a lot more time and money than that!