By: anon (anon.delete@this.anon.com), April 25, 2017 12:49 am
Room: Moderated Discussions
Brett (ggtgp.delete@this.yahoo.com) on April 24, 2017 9:55 pm wrote:
> anon (anon.delete@this.anon.com) on April 23, 2017 10:12 pm wrote:
> > anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 4:44 pm wrote:
> > > Travis (travis.downs.delete@this.gmail.com) on April 23, 2017 3:42 pm wrote:
> > > > anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 12:47 pm wrote:
> > > >
> > > > > POWER8 was built for SMT. The issue queue is split in two halves, both can issue one
> > > > > group of 3 instructions + 1 branch per cycle. Both have their own register files.
> > > > >
> > > > > In ST mode the content of the PRFs is identical so you can effectively
> > > > > issue 6+2, but that doesn't really change the rename width.
> > > >
> > > > Isn't that just semantics, or implementation details though? In ST mode, it can rename 6 non-control ops,
> > > > so it is "up to" 6-wide from a software point of view with some caveats relating to the grouping (just like
> > > > many of the other archs have caveats related to the instruction mix and how it interacts with renaming).
> > >
> > > My point was that it's not meant to be 6 (or 8) wide. It's 3 (4) wide for 1-4 threads.
> >
> > Factually false. It's 8 wide for 1 thread. Pseudo 4 wide halves for 2-8 threads.
> > Pseudo because some instructions use or block both halves of the pipeline.
>
> Benchmarks do not support this assertion, on single thread
> tasks the four wide Intel chips dominate against Power8.
Benchmarks are not necessary or sufficient to support the assertion, so that's no problem.
Assertion is well supported by public documentation. Power8 is 8 wide. You could make all sorts of qualifications and subsequent refinements like reduced effective width due to grouping etc. You could even call it 4 wide for 2+ threads. But 4 wide for 1 thread it is not.
You do have some kind of vague grasp of the fact that width is not the same as IPC, don't you?
>
> > > Having more execution resources available in ST mode is nice,
> > > but not important for anything except marketing/licensing.
> >
> > Also wrong. Single thread performance is something IBM has
> > made no secret of working to improve. Even on parallel
> > workloads it often remains the gating factor for scalability and for minimum SLA response times.
> >
> > >
> > > If you think using duplicated PRFs is a viable way to implement 6/8 wide then I've got news for you.
> > >
> > > Split PRFs are viable, but straight up duplicating and forwarding everything is utterly insane.
> anon (anon.delete@this.anon.com) on April 23, 2017 10:12 pm wrote:
> > anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 4:44 pm wrote:
> > > Travis (travis.downs.delete@this.gmail.com) on April 23, 2017 3:42 pm wrote:
> > > > anon (spam.delete.delete@this.this.spam.com) on April 23, 2017 12:47 pm wrote:
> > > >
> > > > > POWER8 was built for SMT. The issue queue is split in two halves, both can issue one
> > > > > group of 3 instructions + 1 branch per cycle. Both have their own register files.
> > > > >
> > > > > In ST mode the content of the PRFs is identical so you can effectively
> > > > > issue 6+2, but that doesn't really change the rename width.
> > > >
> > > > Isn't that just semantics, or implementation details though? In ST mode, it can rename 6 non-control ops,
> > > > so it is "up to" 6-wide from a software point of view with some caveats relating to the grouping (just like
> > > > many of the other archs have caveats related to the instruction mix and how it interacts with renaming).
> > >
> > > My point was that it's not meant to be 6 (or 8) wide. It's 3 (4) wide for 1-4 threads.
> >
> > Factually false. It's 8 wide for 1 thread. Pseudo 4 wide halves for 2-8 threads.
> > Pseudo because some instructions use or block both halves of the pipeline.
>
> Benchmarks do not support this assertion, on single thread
> tasks the four wide Intel chips dominate against Power8.
Benchmarks are not necessary or sufficient to support the assertion, so that's no problem.
Assertion is well supported by public documentation. Power8 is 8 wide. You could make all sorts of qualifications and subsequent refinements like reduced effective width due to grouping etc. You could even call it 4 wide for 2+ threads. But 4 wide for 1 thread it is not.
You do have some kind of vague grasp of the fact that width is not the same as IPC, don't you?
>
> > > Having more execution resources available in ST mode is nice,
> > > but not important for anything except marketing/licensing.
> >
> > Also wrong. Single thread performance is something IBM has
> > made no secret of working to improve. Even on parallel
> > workloads it often remains the gating factor for scalability and for minimum SLA response times.
> >
> > >
> > > If you think using duplicated PRFs is a viable way to implement 6/8 wide then I've got news for you.
> > >
> > > Split PRFs are viable, but straight up duplicating and forwarding everything is utterly insane.