By: anon (anon.delete@this.anon.com), August 9, 2014 12:29 am
Room: Moderated Discussions
Megol (golem960.delete@this.gmail.com) on August 8, 2014 11:23 am wrote:
> juanrga (nospam.delete@this.juanrga.com) on August 8, 2014 10:49 am wrote:
> > anon (anon.delete@this.anon.com) on August 6, 2014 7:54 pm wrote:
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length.
>
> True. But the way to handle this is well known nowadays,
That's a complete non-point, and it does not mean that no disadvantage exists. You could just as well say that Intel "handed this well" with the Pentium or 386, for some values of "well".
Atoms are 2 wide, even the SMT Atom is only 2 wide! While ARM went to 3 wide rather easily.
> one way is using massive parallel
> length decoding, another is to use predecode data and tag each byte. There have been
> arguments that the later technique can scale up to 8 instructions decoded/clock with
> most complexity being those things a RISC also need (tracking dependencies++).
The big Intel cores use significant complexity to tackle the problem and they're stuck at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt on its target workloads). Not that this is attributable to decoder alone or x86 tax at all necessarily, but just to head off any claim of it being a furnace.
I don't know what you mean by "tracking dependencies++", but there is no indication that POWER8 uses a uop cache, so you're simply wrong.
>
> >Also the x86 ISA is full of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> Wrong. Legacy instructions need some hardware, true. But most of the functionality
> is implemented in microcode instead of adding complex hardware.
> Now there are some quirks in the x86 ISA that does waste power like handling
> of shift by zero, calculating the auxilary flag (nibble carry) etc. But those
> are far from the most power consuming parts of an OoO processor core.
>
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Now that's 100% true. X86 is a complex beast to implement and much of the complexities aren't really documented.
> But those undocumented things are used, knowingly or otherwise, and have to be supported.
>
Well, obviously they're well documented inside Intel and AMD nowadays, and that's all that really matters. That's not the cost of implementation. The cost is in the jet engines required to make the pig fly.
> juanrga (nospam.delete@this.juanrga.com) on August 8, 2014 10:49 am wrote:
> > anon (anon.delete@this.anon.com) on August 6, 2014 7:54 pm wrote:
> > > I have also heard from many people (it's possible this is just an uninformed 'echo chamber effect',
> > > but I think there is some merit to the idea) that x86 cores take significantly more design skill
> > > than an equivalent ARM core. Whether this is due to compatibility, or decoders, or necessity of
> > > more capable memory pipline and caches, I don't know, but it seems to also be an x86 tax.
> >
> > E.g. a x86 decoder is more difficult to implement than an ARM64 decoder, because the former has to match
> > instructions of variable length.
>
> True. But the way to handle this is well known nowadays,
That's a complete non-point, and it does not mean that no disadvantage exists. You could just as well say that Intel "handed this well" with the Pentium or 386, for some values of "well".
Atoms are 2 wide, even the SMT Atom is only 2 wide! While ARM went to 3 wide rather easily.
> one way is using massive parallel
> length decoding, another is to use predecode data and tag each byte. There have been
> arguments that the later technique can scale up to 8 instructions decoded/clock with
> most complexity being those things a RISC also need (tracking dependencies++).
The big Intel cores use significant complexity to tackle the problem and they're stuck at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt on its target workloads). Not that this is attributable to decoder alone or x86 tax at all necessarily, but just to head off any claim of it being a furnace.
I don't know what you mean by "tracking dependencies++", but there is no indication that POWER8 uses a uop cache, so you're simply wrong.
>
> >Also the x86 ISA is full of legacy instructions, which have to be implemented
> > in hardware and then verified/tested which increases development costs and time of development.
>
> Wrong. Legacy instructions need some hardware, true. But most of the functionality
> is implemented in microcode instead of adding complex hardware.
> Now there are some quirks in the x86 ISA that does waste power like handling
> of shift by zero, calculating the auxilary flag (nibble carry) etc. But those
> are far from the most power consuming parts of an OoO processor core.
>
> > According to Feldman an entirely custom server chip using the ARM architecture takes about 18 months
> > and about $30 million. By contrast, it takes three or four-year time frame and $300--400 million in
> > development costs required to build an x86-based server chip based on a new micro-architecture.
>
> Now that's 100% true. X86 is a complex beast to implement and much of the complexities aren't really documented.
> But those undocumented things are used, knowingly or otherwise, and have to be supported.
>
Well, obviously they're well documented inside Intel and AMD nowadays, and that's all that really matters. That's not the cost of implementation. The cost is in the jet engines required to make the pig fly.