By: anon (anon.delete@this.anon.com), August 11, 2014 5:13 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on August 10, 2014 12:37 am wrote:
> anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > on its target workloads).
>
> Saying that 8-wide decode/issue has been achieved "without problems" is an understatement
> to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it
> has some significant limitations and quirks due to the group formation mechanism (each group
> is made of up to 8 instructions and handled as a single entity for dispatch/completion):
>
> - An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
> - The second branch always ends a group
> - Some branches can be predicated (and thus considered non-branch)
> but they can fit in only one specific slot within the group
> - Instructions cracked in 2 µops can fit only in the first half
> of the group, instructions cracked in 3 µops end the group
> - FP instructions are not allowed to appear after a branch within a group
> - Certain instructions must appear either at the beginning or end of
> the group (or both, in which case they're dispatched in isolation)
>
> Those are some pretty significant limitations, especially the basic group formation (6 non-branch
> + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each)
> but that goes to show that an 8-way RISC machine is by no means an easy achievement.
I'm talking specifically about instruction decoding. Those restrictions in group formation and dispatch are due to limitations in other parts of the pipeline, and are in no way analogous to instruction type restrictions in Intel's x86 decoders.
> anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > on its target workloads).
>
> Saying that 8-wide decode/issue has been achieved "without problems" is an understatement
> to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it
> has some significant limitations and quirks due to the group formation mechanism (each group
> is made of up to 8 instructions and handled as a single entity for dispatch/completion):
>
> - An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
> - The second branch always ends a group
> - Some branches can be predicated (and thus considered non-branch)
> but they can fit in only one specific slot within the group
> - Instructions cracked in 2 µops can fit only in the first half
> of the group, instructions cracked in 3 µops end the group
> - FP instructions are not allowed to appear after a branch within a group
> - Certain instructions must appear either at the beginning or end of
> the group (or both, in which case they're dispatched in isolation)
>
> Those are some pretty significant limitations, especially the basic group formation (6 non-branch
> + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each)
> but that goes to show that an 8-way RISC machine is by no means an easy achievement.
I'm talking specifically about instruction decoding. Those restrictions in group formation and dispatch are due to limitations in other parts of the pipeline, and are in no way analogous to instruction type restrictions in Intel's x86 decoders.