By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), August 10, 2014 12:37 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 9, 2014 12:29 am wrote:
> The big Intel cores use significant complexity to tackle the problem and they're stuck
> at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> on its target workloads).
Saying that 8-wide decode/issue has been achieved "without problems" is an understatement to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it has some significant limitations and quirks due to the group formation mechanism (each group is made of up to 8 instructions and handled as a single entity for dispatch/completion):
- An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
- The second branch always ends a group
- Some branches can be predicated (and thus considered non-branch) but they can fit in only one specific slot within the group
- Instructions cracked in 2 µops can fit only in the first half of the group, instructions cracked in 3 µops end the group
- FP instructions are not allowed to appear after a branch within a group
- Certain instructions must appear either at the beginning or end of the group (or both, in which case they're dispatched in isolation)
Those are some pretty significant limitations, especially the basic group formation (6 non-branch + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each) but that goes to show that an 8-way RISC machine is by no means an easy achievement.
> The big Intel cores use significant complexity to tackle the problem and they're stuck
> at 4. POWER has reached 8 without problems (with almost certainly better throughput/watt
> on its target workloads).
Saying that 8-wide decode/issue has been achieved "without problems" is an understatement to say the least. While POWER8 is a genuinely 8-way machine even in single-thread mode it has some significant limitations and quirks due to the group formation mechanism (each group is made of up to 8 instructions and handled as a single entity for dispatch/completion):
- An 8-way group can never have more than 6 non-branch instructions and 2 branch instructions
- The second branch always ends a group
- Some branches can be predicated (and thus considered non-branch) but they can fit in only one specific slot within the group
- Instructions cracked in 2 µops can fit only in the first half of the group, instructions cracked in 3 µops end the group
- FP instructions are not allowed to appear after a branch within a group
- Certain instructions must appear either at the beginning or end of the group (or both, in which case they're dispatched in isolation)
Those are some pretty significant limitations, especially the basic group formation (6 non-branch + 2 branch). 2+ threaded mode is somewhat relaxed as groups are small (4 instructions each) but that goes to show that an 8-way RISC machine is by no means an easy achievement.