By: juanrga (nospam.delete@this.juanrga.com), August 10, 2014 6:07 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on August 10, 2014 12:37 am wrote:
> anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > on its target workloads).
>
> Saying that 8-wide decode/issue has been achieved "without problems" is an understatement
> to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it
> has some significant limitations and quirks due to the group formation mechanism (each group
> is made of up to 8 instructions and handled as a single entity for dispatch/completion):
>
> - An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
> - The second branch always ends a group
> - Some branches can be predicated (and thus considered non-branch)
> but they can fit in only one specific slot within the group
> - Instructions cracked in 2 µops can fit only in the first half
> of the group, instructions cracked in 3 µops end the group
> - FP instructions are not allowed to appear after a branch within a group
> - Certain instructions must appear either at the beginning or end of
> the group (or both, in which case they're dispatched in isolation)
>
> Those are some pretty significant limitations, especially the basic group formation (6 non-branch
> + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each)
> but that goes to show that an 8-way RISC machine is by no means an easy achievement.
I don't think this is anything related to RISC machines. It looks as balanced design for the current software target limitations: ILP tops at about 6-7 instructions in general software and branching in server code comes about each 8 instructions or so.
It would no sense for IBM to go for a full 10-wide machine, if software couldn't use the extra performance.
> anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> > The big Intel cores use significant complexity to tackle the problem and they're stuck
> > at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> > on its target workloads).
>
> Saying that 8-wide decode/issue has been achieved "without problems" is an understatement
> to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it
> has some significant limitations and quirks due to the group formation mechanism (each group
> is made of up to 8 instructions and handled as a single entity for dispatch/completion):
>
> - An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
> - The second branch always ends a group
> - Some branches can be predicated (and thus considered non-branch)
> but they can fit in only one specific slot within the group
> - Instructions cracked in 2 µops can fit only in the first half
> of the group, instructions cracked in 3 µops end the group
> - FP instructions are not allowed to appear after a branch within a group
> - Certain instructions must appear either at the beginning or end of
> the group (or both, in which case they're dispatched in isolation)
>
> Those are some pretty significant limitations, especially the basic group formation (6 non-branch
> + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each)
> but that goes to show that an 8-way RISC machine is by no means an easy achievement.
I don't think this is anything related to RISC machines. It looks as balanced design for the current software target limitations: ILP tops at about 6-7 instructions in general software and branching in server code comes about each 8 instructions or so.
It would no sense for IBM to go for a full 10-wide machine, if software couldn't use the extra performance.