By: anon (anon.delete@this.anon.com), August 11, 2014 3:56 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 10, 2014 11:30 pm wrote:
> anon (anon.delete@this.anon.com) on August 10, 2014 5:27 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on August 10, 2014 3:11 am wrote:
> > > anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > > >
> > > > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > > > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > > > on its target workloads).
> > >
> > > "almost certainly" is way to strong a statement. It's possible, yes. But so far we have zero evidence.
> >
> > We have non-zero evidence. Not complete, but there is evidence.
> >
>
> If I am not mistaken, all we have now are very impressive 4x6-core Power8 SAP SD 2-tier scores that still
> lose in absolute numbers to 4x15-core Intel and approximately matches die-for-die 16x16-core Fujitsu.
> We don't know which system between the three consumes less power under load, not even approximately.
Well, we have some power specifications from IBM and Intel too, although granted if looking at system power, or even just taking into account the external memory controllers on POWER8, we don't know for sure.
>
> > >
> > > > Not that this is attributable to decoder alone or x86 tax
> > > > at all necessarily, but just to head off any claim of it being a furnace.
> > > >
> > > > I don't know what you mean by "tracking dependencies++", but there is
> > > > no indication that POWER8 uses a uop cache, so you're simply wrong.
> > > >
> > >
> > > Tracking dependencies withing group of instructions that
> > > are renamed in parallel. Conventional wisdom says that
> > > it has complexity of O(width^2). May be there was algorithmic breakthrough in this area, I don't know...
> >
> > That has nothing to do with decoding stage, however.
> >
>
> The context was practical limits of the width of in-order front end of OoO cores.
No, it was, very specifically, the decoding cost. My comment was the decoding cost was higher, and the response was something along the lines of "not really because all CPUs have to track dependencies anyway", which is just stupid.
Decoding cost of x86 is higher than most other ISAs, particularly paralllel decoding.
> anon (anon.delete@this.anon.com) on August 10, 2014 5:27 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on August 10, 2014 3:11 am wrote:
> > > anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > > >
> > > > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > > > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > > > on its target workloads).
> > >
> > > "almost certainly" is way to strong a statement. It's possible, yes. But so far we have zero evidence.
> >
> > We have non-zero evidence. Not complete, but there is evidence.
> >
>
> If I am not mistaken, all we have now are very impressive 4x6-core Power8 SAP SD 2-tier scores that still
> lose in absolute numbers to 4x15-core Intel and approximately matches die-for-die 16x16-core Fujitsu.
> We don't know which system between the three consumes less power under load, not even approximately.
Well, we have some power specifications from IBM and Intel too, although granted if looking at system power, or even just taking into account the external memory controllers on POWER8, we don't know for sure.
>
> > >
> > > > Not that this is attributable to decoder alone or x86 tax
> > > > at all necessarily, but just to head off any claim of it being a furnace.
> > > >
> > > > I don't know what you mean by "tracking dependencies++", but there is
> > > > no indication that POWER8 uses a uop cache, so you're simply wrong.
> > > >
> > >
> > > Tracking dependencies withing group of instructions that
> > > are renamed in parallel. Conventional wisdom says that
> > > it has complexity of O(width^2). May be there was algorithmic breakthrough in this area, I don't know...
> >
> > That has nothing to do with decoding stage, however.
> >
>
> The context was practical limits of the width of in-order front end of OoO cores.
No, it was, very specifically, the decoding cost. My comment was the decoding cost was higher, and the response was something along the lines of "not really because all CPUs have to track dependencies anyway", which is just stupid.
Decoding cost of x86 is higher than most other ISAs, particularly paralllel decoding.