By: Kevin G (kevin.delete@this.cubitdesigns.com), August 10, 2014 9:26 pm
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on August 10, 2014 4:27 am wrote:
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> X86 dominates its space by having superior force behind it and backwards compatible requirements,
> but it doesn't cause x86 tax being nearly zero when trying to go for maximum performance.
>
> There are few things that limit performance one is memory subsystem in which being
> x86 is only minor irritation with few extra spills, which is easily overcome by more
> resources. So x86 cost only shows up in the number of L1 cache accesses.
>
> Another is branch miss prediction. In here x86 tax shows up by requiring many more
> stages in the pipeline, which increases number of squashed instructions and lost power
> for given width and increases performance penalty per branch miss prediction. In branch
> heavy integer code simple decode may get ahead if given similar resources.
>
> The most often forgotten cost is how good JIT target is x86 vs the competing architecture. By
> being simpler JIT target means it reduces cost of running a jitted language in architecture.
>
> If you go for maximum performance IF all things would be equal I'd estimate total performance difference
> between x86 server and 64bit only arm server would be in order of 20-30% eventually.
> Unfortunately for arm competition not all things are equal Intel has more resources to throw at every problem
> they have, and have superior manufacturing, so in the end Intel may get ahead and continue to dominate server
> space. I do expect arm competitors to eventually get the implementation tricks Intel has been using to get
> ahead in performance, and at some point there are no new tricks available for Intel to get ahead.
> On the other hand when switching from silicon to something else Intel may totally dominate
> everything by being first in that transition if it means gaining large perf/power advantage
> and then no-one can compete Intel anymore because of their new 15 billion dollar Fab is only
> thing that can produce the new high performance stuff, and high end phone stuff.
The thing is that Intel doesn't design for maximum performance. Every avenue Intel has to increase performance has to be balanced by the additional mount of it consumes. Right now Intel is operating under a 2:1 rule that for every 2% in performance a design change gains it is only permitted to use an additional 1% of power (source). There is a similar restriction on die area but with a different ratio.
The implication here is that Intel likely has several tricks up its sleeve if there was a compelling reasons to increase performance, power consumption be damned. If Intel felt threatened by another architecture in terms of performance, they could loosen their self imposed design rules to remain competitive.
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> X86 dominates its space by having superior force behind it and backwards compatible requirements,
> but it doesn't cause x86 tax being nearly zero when trying to go for maximum performance.
>
> There are few things that limit performance one is memory subsystem in which being
> x86 is only minor irritation with few extra spills, which is easily overcome by more
> resources. So x86 cost only shows up in the number of L1 cache accesses.
>
> Another is branch miss prediction. In here x86 tax shows up by requiring many more
> stages in the pipeline, which increases number of squashed instructions and lost power
> for given width and increases performance penalty per branch miss prediction. In branch
> heavy integer code simple decode may get ahead if given similar resources.
>
> The most often forgotten cost is how good JIT target is x86 vs the competing architecture. By
> being simpler JIT target means it reduces cost of running a jitted language in architecture.
>
> If you go for maximum performance IF all things would be equal I'd estimate total performance difference
> between x86 server and 64bit only arm server would be in order of 20-30% eventually.
> Unfortunately for arm competition not all things are equal Intel has more resources to throw at every problem
> they have, and have superior manufacturing, so in the end Intel may get ahead and continue to dominate server
> space. I do expect arm competitors to eventually get the implementation tricks Intel has been using to get
> ahead in performance, and at some point there are no new tricks available for Intel to get ahead.
> On the other hand when switching from silicon to something else Intel may totally dominate
> everything by being first in that transition if it means gaining large perf/power advantage
> and then no-one can compete Intel anymore because of their new 15 billion dollar Fab is only
> thing that can produce the new high performance stuff, and high end phone stuff.
The thing is that Intel doesn't design for maximum performance. Every avenue Intel has to increase performance has to be balanced by the additional mount of it consumes. Right now Intel is operating under a 2:1 rule that for every 2% in performance a design change gains it is only permitted to use an additional 1% of power (source). There is a similar restriction on die area but with a different ratio.
The implication here is that Intel likely has several tricks up its sleeve if there was a compelling reasons to increase performance, power consumption be damned. If Intel felt threatened by another architecture in terms of performance, they could loosen their self imposed design rules to remain competitive.