By: Stubabe (Stubabe.delete@this.nospam.com), October 7, 2015 12:17 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on October 6, 2015 11:17 am wrote:
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on October 5, 2015 2:25 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on October 5, 2015 10:46 am wrote:
> > > You said that Skylake is
> > >
> > > > > > - 5 wide decode
> > > > > > - 6 wide allocation/decoder queue
> > > > > > - 6 wide ROB
> > > > > > - 8 wide issue
> > > > > > - 8 wide retire (4/thread)
> > >
> > > Using retire as metric, Skylake is then 8-wide. Haswell/Broadwell can
> > > also retire up to 8 uops per cycle. Thus both are 8-wide as well.
> >
> > Are you kidding now? You multiply Haswells number by 2x because of your "unfused"
> > reasoning, but then compare it to Skylakes number *without* doing the same multiplication,
> > and then claim with a straight face that "thus" they are the same width?
> >
> > > There is no way that Skylake can issue and retire 16 ops per cycle, and Anandtech don't say the contrary.
> >
> > We weren't talking about issue at all. And for the record I don't think we should call
> > Skylake 16 wide either, but it *does* follow from *your* logic of multiplying Haswells
> > 4-wide retire rate by 2x, and the fact that Skylake doubled Haswells peak rate.
>
> I am not multiplying anything. Haswell/Broadwell can issue, execute, and retire
> up to 8 uops per cycle. I don't know how many uops can do Skylake per cycle, but
> you said above it can issue and retire 8. Therefore both are same wide.
>
> You are the one that said we have to multiply your number by 2x
> to obtain 16 for Skylake. And said you Skylake is not 16-wide.
using fuops to indicate micro fused uops (e.g. load-exec type uops):
Haswell and Broadwell can only allocate 4 fuops/cycle and retire 4 fuops/cycle
Skylake can allocate 6fuops/cycle and retire 8 fuops/cycle (4/cycle per thread)
Both can issue 8 unfused operations per cycle to execution units.
Considering load-exec type x86 instructions realistically will comprise far less than 50% of instructions it is clear that skylake's back end is significantly wider (50%) than Haswell and in particular Skylake has a 100% wider retirement unit.
So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
> Wouter Tinus (wouter.tinus.delete@this.gmail.com) on October 5, 2015 2:25 pm wrote:
> > juanrga (nospam.delete@this.juanrga.com) on October 5, 2015 10:46 am wrote:
> > > You said that Skylake is
> > >
> > > > > > - 5 wide decode
> > > > > > - 6 wide allocation/decoder queue
> > > > > > - 6 wide ROB
> > > > > > - 8 wide issue
> > > > > > - 8 wide retire (4/thread)
> > >
> > > Using retire as metric, Skylake is then 8-wide. Haswell/Broadwell can
> > > also retire up to 8 uops per cycle. Thus both are 8-wide as well.
> >
> > Are you kidding now? You multiply Haswells number by 2x because of your "unfused"
> > reasoning, but then compare it to Skylakes number *without* doing the same multiplication,
> > and then claim with a straight face that "thus" they are the same width?
> >
> > > There is no way that Skylake can issue and retire 16 ops per cycle, and Anandtech don't say the contrary.
> >
> > We weren't talking about issue at all. And for the record I don't think we should call
> > Skylake 16 wide either, but it *does* follow from *your* logic of multiplying Haswells
> > 4-wide retire rate by 2x, and the fact that Skylake doubled Haswells peak rate.
>
> I am not multiplying anything. Haswell/Broadwell can issue, execute, and retire
> up to 8 uops per cycle. I don't know how many uops can do Skylake per cycle, but
> you said above it can issue and retire 8. Therefore both are same wide.
>
> You are the one that said we have to multiply your number by 2x
> to obtain 16 for Skylake. And said you Skylake is not 16-wide.
using fuops to indicate micro fused uops (e.g. load-exec type uops):
Haswell and Broadwell can only allocate 4 fuops/cycle and retire 4 fuops/cycle
Skylake can allocate 6fuops/cycle and retire 8 fuops/cycle (4/cycle per thread)
Both can issue 8 unfused operations per cycle to execution units.
Considering load-exec type x86 instructions realistically will comprise far less than 50% of instructions it is clear that skylake's back end is significantly wider (50%) than Haswell and in particular Skylake has a 100% wider retirement unit.
So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.