By: David Kanter (dkanter.delete@this.realworldtech.com), August 10, 2014 9:15 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 10, 2014 5:25 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on August 9, 2014 9:55 am wrote:
> > > > > Well, obviously they're well documented inside Intel and AMD nowadays, and that's all that really matters.
> > > > > That's not the cost of implementation. The cost is in the jet engines required to make the pig fly.
> > > >
> > > > Yeah... The problem for that argument that there are no jet engines. There are no special
> > > > things that only Intel can do and only does it to make the x86 ISA competitive.
> > >
> > > Yes there is. Big decoders, high performance microcode modes, complex decoded instruction caches,
> > > big and capable store forwarding, memory disambiguation, stack tracking, memory speculation, etc.
> > >
> > > Many of these things are good regardless of ISA, but also other ISAs did not have to implement them. POWER6
> > > had no store forwarding, POWER7 (I believe) did not have memory disambiguation, none of uop caches.
> >
> > No store forwarding is just retarded, and the POWER6 was not a good design.
>
> No, if the cost/benefit analysis was done, and store forwarding
> reduced the perf/watt, then removing it was not retarded.
>
> The point is that an x86 design would likely never have had
> that option, because store forwarding is more highly used.
>
> >
> > > > Look at the reality: x86 processors are among the highest performing with the lowest cost to comparative
> > > > performance alternatives.
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
>
> In mobile space it has not done well either.
>
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> I would call 5-10% a very significant penalty.
My point is that the cores are only a portion of the overall server CPU. ISA doesn't impact the L3 cache, the IO, the coherency logic, memory controller, DFT, power management, etc. etc. etc.
Saving 5-10% on 25-35% of the overall chip isn't very impressive or important. Especially considering that a more advanced process node impacts the entire chip and will save a huge amount of area and power.
David
> David Kanter (dkanter.delete@this.realworldtech.com) on August 9, 2014 9:55 am wrote:
> > > > > Well, obviously they're well documented inside Intel and AMD nowadays, and that's all that really matters.
> > > > > That's not the cost of implementation. The cost is in the jet engines required to make the pig fly.
> > > >
> > > > Yeah... The problem for that argument that there are no jet engines. There are no special
> > > > things that only Intel can do and only does it to make the x86 ISA competitive.
> > >
> > > Yes there is. Big decoders, high performance microcode modes, complex decoded instruction caches,
> > > big and capable store forwarding, memory disambiguation, stack tracking, memory speculation, etc.
> > >
> > > Many of these things are good regardless of ISA, but also other ISAs did not have to implement them. POWER6
> > > had no store forwarding, POWER7 (I believe) did not have memory disambiguation, none of uop caches.
> >
> > No store forwarding is just retarded, and the POWER6 was not a good design.
>
> No, if the cost/benefit analysis was done, and store forwarding
> reduced the perf/watt, then removing it was not retarded.
>
> The point is that an x86 design would likely never have had
> that option, because store forwarding is more highly used.
>
> >
> > > > Look at the reality: x86 processors are among the highest performing with the lowest cost to comparative
> > > > performance alternatives.
> > >
> > > I would ask you to do the same thing: x86 processors are among the highest performing in space they
> > > have been targeting for the past several decades. From somewhere around Atom space to all the way down
> > > to ~10,000 gate microcontrollers, x86 is anywhere from uncompetitive to completely impossible.
> >
> > Yes that's true. But for something that is 2-issue or wider and
> > targeted at high performance, x86 simply isn't a major barrier.
>
> In mobile space it has not done well either.
>
> >
> > I know a number of architects who have worked on high-performance x86 and ARM cores, and they all say
> > the same thing. The difference between the two ISAs is not that significant. x86 has more complexity,
> > but it's in the noise compared to the difficulty of designing a good memory subsystem, etc.
> >
> > > > It the ISA has overheads
> > >
> > > Oh, so it does have overheads now?
> >
> > x86 overheads are noticeable and depend on the target market. As you pointed out, it's not very attractive
> > for microcontrollers at all...but for a high performance core, the difference is perhaps 5-10% of the core
> > area. And in most SoCs, cores are around 25-40% of the total area. 10% of 40% is 4% - and that's best case.
>
> I would call 5-10% a very significant penalty.
My point is that the cores are only a portion of the overall server CPU. ISA doesn't impact the L3 cache, the IO, the coherency logic, memory controller, DFT, power management, etc. etc. etc.
Saving 5-10% on 25-35% of the overall chip isn't very impressive or important. Especially considering that a more advanced process node impacts the entire chip and will save a huge amount of area and power.
David