By: David Kanter (dkanter.delete@this.realworldtech.com), October 5, 2015 11:24 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on October 5, 2015 10:46 am wrote:
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on October 4, 2015 4:18 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on October 4, 2015 6:30 am wrote:
> > > Are you kidding?
> >
> > No I'm not. Either your reasoning is inconsistent or you're missing some of the facts.
> > You started by claiming: "Skylake is 8-wide (unfused uops) like Haswell," yet the former
> > can in fact retire twice as many µops per cycle as the latter [1]. So regardless of whether
> > you want to count unfused or fused µops, Skylake is wider than Haswell.
> >
> > If you want to multiply both numbers by two to get unfused µops [2], then you should
> > rightly call Skylake a 16-wide core (2 threads x 4 µops x 2). If you don't want to
> > do that, then perhaps you should reconsider your definition of wideness :)
> >
> > [1] http://www.anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/5
> > [2] I'm not sure if that is fair or accurate, but not judging here
> >
>
> You said that Skylake is
>
> > > > - 5 wide decode
> > > > - 6 wide allocation/decoder queue
> > > > - 6 wide ROB
> > > > - 8 wide issue
> > > > - 8 wide retire (4/thread)
>
> Using retire as metric, Skylake is then 8-wide. Haswell/Broadwell can
> also retire up to 8 uops per cycle. Thus both are 8-wide as well.
>
> There is no way that Skylake can issue and retire 16 ops per cycle, and Anandtech don't say the contrary.
Generally, I tend to refer to microarchitectures as X-wide, where X is determined by the narrowest stage. Poulson cannot sustain 12 instructions per cycle, and is generally a 6-wide machine that just happens to have 12 execution units.
With Skylake, it's a little complex - I can see an argument for 6-wide (assuming hit in the uop cache) or 5-wide. Both of those are sustainable, although in the case of I$ hits, I'm not sure there is really enough fetch bandwidth.
But it's clear Skylake cannot sustain 8-wide retire. Moreover, if only a single thread is active, it becomes limited to 4-wide!
David
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on October 4, 2015 4:18 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on October 4, 2015 6:30 am wrote:
> > > Are you kidding?
> >
> > No I'm not. Either your reasoning is inconsistent or you're missing some of the facts.
> > You started by claiming: "Skylake is 8-wide (unfused uops) like Haswell," yet the former
> > can in fact retire twice as many µops per cycle as the latter [1]. So regardless of whether
> > you want to count unfused or fused µops, Skylake is wider than Haswell.
> >
> > If you want to multiply both numbers by two to get unfused µops [2], then you should
> > rightly call Skylake a 16-wide core (2 threads x 4 µops x 2). If you don't want to
> > do that, then perhaps you should reconsider your definition of wideness :)
> >
> > [1] http://www.anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/5
> > [2] I'm not sure if that is fair or accurate, but not judging here
> >
>
> You said that Skylake is
>
> > > > - 5 wide decode
> > > > - 6 wide allocation/decoder queue
> > > > - 6 wide ROB
> > > > - 8 wide issue
> > > > - 8 wide retire (4/thread)
>
> Using retire as metric, Skylake is then 8-wide. Haswell/Broadwell can
> also retire up to 8 uops per cycle. Thus both are 8-wide as well.
>
> There is no way that Skylake can issue and retire 16 ops per cycle, and Anandtech don't say the contrary.
Generally, I tend to refer to microarchitectures as X-wide, where X is determined by the narrowest stage. Poulson cannot sustain 12 instructions per cycle, and is generally a 6-wide machine that just happens to have 12 execution units.
With Skylake, it's a little complex - I can see an argument for 6-wide (assuming hit in the uop cache) or 5-wide. Both of those are sustainable, although in the case of I$ hits, I'm not sure there is really enough fetch bandwidth.
But it's clear Skylake cannot sustain 8-wide retire. Moreover, if only a single thread is active, it becomes limited to 4-wide!
David