By: Michael S (already5chosen.delete@this.yahoo.com), August 7, 2008 8:27 am
Room: Moderated Discussions
Potatoswatter (potswa_m@c.com) on 8/7/08 wrote:
---------------------------
>Michael S (already5chosen@yahoo.com) on 8/7/08 wrote:
>---------------------------
>>Disclaimer:
>>The discussion below is purely theoretical and not related to Power7. After very
>>brief observation of the info provided in this thread I tend to agree with Potatoswatter:
>>Power7 architecture merges VPR and FPRs but retains traditional separation between GPRs and the rest.
>>
>>
>>RagingDragon (a@b.c) on 8/6/08 wrote:
>>---------------------------
>>>Anil Maliyekkel (a@a.edu) on 8/6/08 wrote:
>>>---------------------------
>>>>
>>>>They alias with the entire 128 bit VSX register.
>>>
>>>Does that mean the CPU would have one large register file, shared by integer, floating
>>>point, and vector execution units? If so that would be an extremely unconventional
>>>design - has anyone ever made a CPU like that?
>>>
>>
>>At uArch layer that's exactly what Intel is doing in PM and Merom. Possibly, P6 too but I'm not 100% sure.
>>
>>At software-visible architecture layer you can look, for example, at Freescale
>>e500 and possibly other E-Book compliant PPC cores. I am sure, that there were multiple historic predecessors
>>
>>>What would be the benefit of a shared register file? I guess it would be sacrificing
>>>ILP to increase TLP - i.e. allow more cores (fewer transistors, less power, per
>>>core) at the expense of reducing the number instructions executed per cycle by each
>>>core due increased register contention.
>>
>>It depends. I don't see that Merom sacrifices any ILP relatively to, for example,
>>K8/K10 that feature split register files.
>>Of course, Merom shares physical registers rather than architected, but the # of
>>architected registers in Power7 is similar to # of physical registers in Merom (IRCC, 72).
>>
>>>It would also facilitate sharing logic transistors
>>>between integer and integer vector, and floating point and floating point vector
>>>too - another way to increase TLP at the expense of ILP.
>>
>>IMHO, for wide in-order cores the biggest disadvantage is increased fan-out.
>>For less-wide cores, both 'in' and 'out of' order, there is a problem of crowded
>>silicon area around register file, that, in high-frequency design, would lead to
>>pushing FPU further away from registers and possibly to addition of stage in FPU pipeline.
>>
>>For lean cores, neither wide nor hi-freq, I see no disadvantage at all.
>
>Wow, I had no idea about that. I suppose it makes sense, and clears up some confusion
>I had about Intel's block diagrams.
>
>So what you're saying is, Merom and successors store a pair of 64-bit regs in a
>physical 128-bit reg, or else vice versa.
No, not all.
They store a 64-bit GP register in the lower part of 128-bit register (or, may be, physical registers are 144-bit wide, I'm not sure). The upper part is unused.
Similarly, in PM they store 32-bit GP registers and 64-bit half-SSE registers in the lower part of 80-bit physical registers.
>Constantly muxing 128=>64 sounds like
>a pain, so it seems more practical to synthesize vector ops from 2x64bit reg pairs.
>But, doubling the number of functional units in the datapath is physically tougher
>than twice as wide and equal length...
>
>Also, reservation and writeback seem to need a lot more entries. Is there any evidence
>of resource contention between scalar and SSE?
>
>It's wacky that Intel would make this kind of change and still call the result "P6." Not that I doubt it, but wacky.
---------------------------
>Michael S (already5chosen@yahoo.com) on 8/7/08 wrote:
>---------------------------
>>Disclaimer:
>>The discussion below is purely theoretical and not related to Power7. After very
>>brief observation of the info provided in this thread I tend to agree with Potatoswatter:
>>Power7 architecture merges VPR and FPRs but retains traditional separation between GPRs and the rest.
>>
>>
>>RagingDragon (a@b.c) on 8/6/08 wrote:
>>---------------------------
>>>Anil Maliyekkel (a@a.edu) on 8/6/08 wrote:
>>>---------------------------
>>>>
>>>>They alias with the entire 128 bit VSX register.
>>>
>>>Does that mean the CPU would have one large register file, shared by integer, floating
>>>point, and vector execution units? If so that would be an extremely unconventional
>>>design - has anyone ever made a CPU like that?
>>>
>>
>>At uArch layer that's exactly what Intel is doing in PM and Merom. Possibly, P6 too but I'm not 100% sure.
>>
>>At software-visible architecture layer you can look, for example, at Freescale
>>e500 and possibly other E-Book compliant PPC cores. I am sure, that there were multiple historic predecessors
>>
>>>What would be the benefit of a shared register file? I guess it would be sacrificing
>>>ILP to increase TLP - i.e. allow more cores (fewer transistors, less power, per
>>>core) at the expense of reducing the number instructions executed per cycle by each
>>>core due increased register contention.
>>
>>It depends. I don't see that Merom sacrifices any ILP relatively to, for example,
>>K8/K10 that feature split register files.
>>Of course, Merom shares physical registers rather than architected, but the # of
>>architected registers in Power7 is similar to # of physical registers in Merom (IRCC, 72).
>>
>>>It would also facilitate sharing logic transistors
>>>between integer and integer vector, and floating point and floating point vector
>>>too - another way to increase TLP at the expense of ILP.
>>
>>IMHO, for wide in-order cores the biggest disadvantage is increased fan-out.
>>For less-wide cores, both 'in' and 'out of' order, there is a problem of crowded
>>silicon area around register file, that, in high-frequency design, would lead to
>>pushing FPU further away from registers and possibly to addition of stage in FPU pipeline.
>>
>>For lean cores, neither wide nor hi-freq, I see no disadvantage at all.
>
>Wow, I had no idea about that. I suppose it makes sense, and clears up some confusion
>I had about Intel's block diagrams.
>
>So what you're saying is, Merom and successors store a pair of 64-bit regs in a
>physical 128-bit reg, or else vice versa.
No, not all.
They store a 64-bit GP register in the lower part of 128-bit register (or, may be, physical registers are 144-bit wide, I'm not sure). The upper part is unused.
Similarly, in PM they store 32-bit GP registers and 64-bit half-SSE registers in the lower part of 80-bit physical registers.
>Constantly muxing 128=>64 sounds like
>a pain, so it seems more practical to synthesize vector ops from 2x64bit reg pairs.
>But, doubling the number of functional units in the datapath is physically tougher
>than twice as wide and equal length...
>
>Also, reservation and writeback seem to need a lot more entries. Is there any evidence
>of resource contention between scalar and SSE?
>
>It's wacky that Intel would make this kind of change and still call the result "P6." Not that I doubt it, but wacky.
| Topic | Posted By | Date |
|---|---|---|
| POWER7's new instruction set "VSX" | M.Isobe | 08/04/08 04:59 AM |
| POWER7's new instruction set "VSX" | Anonymous4 | 08/04/08 05:56 AM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/04/08 02:17 PM |
| POWER7's new instruction set "VSX" | M.Isobe | 08/04/08 03:13 PM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/04/08 10:35 PM |
| POWER7's new instruction set "VSX" | M.Isobe | 08/04/08 11:32 PM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/04/08 11:55 PM |
| POWER7's new instruction set "VSX" | M.Isobe | 08/05/08 12:50 AM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/05/08 02:58 AM |
| POWER7's new instruction set "VSX" | Michael S | 08/05/08 03:26 AM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/05/08 03:42 AM |
| Sorry, i mean not before Power6 | Potatoswatter | 08/05/08 03:43 AM |
| POWER7's new instruction set "VSX" | M.Isobe | 08/05/08 05:12 AM |
| POWER7's new instruction set "VSX" | Anil Maliyekkel | 08/05/08 07:42 PM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/06/08 04:19 AM |
| POWER7's new instruction set "VSX" | Anil Maliyekkel | 08/06/08 02:17 PM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/06/08 08:24 PM |
| POWER7's new instruction set "VSX" | RagingDragon | 08/06/08 10:25 PM |
| shared register file | Michael S | 08/07/08 12:06 AM |
| shared register file | Potatoswatter | 08/07/08 06:59 AM |
| shared register file | Michael S | 08/07/08 08:27 AM |
| Oh, duh (NT) | Potatoswatter | 08/07/08 09:25 AM |
| shared register file | Linus Torvalds | 08/07/08 08:45 AM |
| shared register file | David Kanter | 08/07/08 08:37 PM |
| shared register file | Michael S | 08/08/08 07:54 AM |
| shared register file | David Kanter | 08/08/08 10:05 AM |
| shared register file | Potatoswatter | 08/08/08 11:33 AM |
| AMD Greyhound? | anon | 08/08/08 02:58 PM |
| Greyhound = Barcelona (NT) | EduardoS | 08/08/08 05:25 PM |
| shared register file | Anil Maliyekkel | 08/07/08 07:45 AM |
| shared register file | Michael S | 08/07/08 08:51 AM |
| shared register file | Anil Maliyekkel | 08/07/08 12:18 PM |
| shared register file | RagingDragon | 08/09/08 02:35 AM |
| shared register file | EduardoS | 08/09/08 08:23 AM |
| shared register files | David Kanter | 08/09/08 09:12 AM |
| shared register files | Thiago Kurovski | 08/09/08 11:17 AM |
| shared register files | David Kanter | 08/09/08 01:38 PM |
| shared register files | Thiago Kurovski | 08/09/08 03:54 PM |
| shared register files | David Kanter | 08/09/08 10:11 PM |
| shared register files | Potatoswatter | 08/09/08 01:09 PM |
| shared register files | Michael S | 08/10/08 12:01 AM |
| shared register files | Potatoswatter | 08/10/08 02:31 AM |
| shared register files | Michael S | 08/10/08 03:33 AM |
| shared register files | Potatoswatter | 08/10/08 05:46 AM |
| shared register files | Michael S | 08/10/08 06:35 AM |
| shared register files | Potatoswatter | 08/10/08 08:50 AM |
| shared register files | David Kanter | 08/10/08 09:41 AM |
| shared register files | RagingDragon | 08/10/08 09:48 AM |
| shared register files | Potatoswatter | 08/10/08 10:12 AM |
| shared register files | Jouni Osmala | 08/10/08 11:10 AM |
| shared register files | M.Isobe | 08/10/08 01:38 AM |
| shared register files | Potatoswatter | 08/10/08 02:33 AM |
| shared register files | RagingDragon | 08/10/08 09:43 AM |
| POWER7's new instruction set "VSX" | Anil Maliyekkel | 08/05/08 07:29 PM |
| POWER7's new instruction set "VSX" | Jouni Osmala | 08/05/08 09:23 PM |
| POWER7's new instruction set "VSX" | Potatoswatter | 08/06/08 04:16 AM |



