By: juanrga (nospam.delete@this.juanrga.com), August 10, 2014 6:24 am
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on August 10, 2014 4:27 am wrote:
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> X86 dominates its space by having superior force behind it and backwards compatible requirements,
> but it doesn't cause x86 tax being nearly zero when trying to go for maximum performance.
>
> There are few things that limit performance one is memory subsystem in which being
> x86 is only minor irritation with few extra spills, which is easily overcome by more
> resources. So x86 cost only shows up in the number of L1 cache accesses.
>
> Another is branch miss prediction. In here x86 tax shows up by requiring many more
> stages in the pipeline, which increases number of squashed instructions and lost power
> for given width and increases performance penalty per branch miss prediction. In branch
> heavy integer code simple decode may get ahead if given similar resources.
>
> The most often forgotten cost is how good JIT target is x86 vs the competing architecture. By
> being simpler JIT target means it reduces cost of running a jitted language in architecture.
>
> If you go for maximum performance IF all things would be equal I'd estimate total performance difference
> between x86 server and 64bit only arm server would be in order of 20-30% eventually.
> Unfortunately for arm competition not all things are equal Intel has more resources to throw at every problem
> they have, and have superior manufacturing, so in the end Intel may get ahead and continue to dominate server
> space. I do expect arm competitors to eventually get the implementation tricks Intel has been using to get
> ahead in performance, and at some point there are no new tricks available for Intel to get ahead.
> On the other hand when switching from silicon to something else Intel may totally dominate
> everything by being first in that transition if it means gaining large perf/power advantage
> and then no-one can compete Intel anymore because of their new 15 billion dollar Fab is only
> thing that can produce the new high performance stuff, and high end phone stuff.
20--30% sounds as the right efficiency numbers at that level. Precisely 90W ARM SoCs are providing around 80--90% of performance of 140W Haswell Xeons.
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> X86 dominates its space by having superior force behind it and backwards compatible requirements,
> but it doesn't cause x86 tax being nearly zero when trying to go for maximum performance.
>
> There are few things that limit performance one is memory subsystem in which being
> x86 is only minor irritation with few extra spills, which is easily overcome by more
> resources. So x86 cost only shows up in the number of L1 cache accesses.
>
> Another is branch miss prediction. In here x86 tax shows up by requiring many more
> stages in the pipeline, which increases number of squashed instructions and lost power
> for given width and increases performance penalty per branch miss prediction. In branch
> heavy integer code simple decode may get ahead if given similar resources.
>
> The most often forgotten cost is how good JIT target is x86 vs the competing architecture. By
> being simpler JIT target means it reduces cost of running a jitted language in architecture.
>
> If you go for maximum performance IF all things would be equal I'd estimate total performance difference
> between x86 server and 64bit only arm server would be in order of 20-30% eventually.
> Unfortunately for arm competition not all things are equal Intel has more resources to throw at every problem
> they have, and have superior manufacturing, so in the end Intel may get ahead and continue to dominate server
> space. I do expect arm competitors to eventually get the implementation tricks Intel has been using to get
> ahead in performance, and at some point there are no new tricks available for Intel to get ahead.
> On the other hand when switching from silicon to something else Intel may totally dominate
> everything by being first in that transition if it means gaining large perf/power advantage
> and then no-one can compete Intel anymore because of their new 15 billion dollar Fab is only
> thing that can produce the new high performance stuff, and high end phone stuff.
20--30% sounds as the right efficiency numbers at that level. Precisely 90W ARM SoCs are providing around 80--90% of performance of 140W Haswell Xeons.