By: Stubabe (Stubabe.delete@this.nospam.com), October 7, 2015 11:57 am
Room: Moderated Discussions
juanrga (nospam.delete@this.juanrga.com) on October 7, 2015 11:56 am wrote:
> Stubabe (Stubabe.delete@this.nospam.com) on October 7, 2015 1:17 am wrote:
>
> > So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> > only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> > clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> > x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
> >
>
> Since this thread is spinning towards another useless terminology/convention
> fight. I will just add this relevant quote and move on:
>
>
>
This isn't an issue of different terminology but your inconsistent terminology.
If you want to use the definition YOU used to describe Haswell as 8-wide then by YOUR standard Skylake can retire 16 uops, which is unrealistic and that's why I suggested its a silly terminology to use. But, you simply cannot claim by ANY definition that the retirement bandwidth of Haswell = Skylake.
There is no possible code sequence that can produce the right mixture of fused uops to sustain 8 issue in Haswell since that would require 4 load units as there are no fused exec-store ops and two units are dedicated to the store path (Store Address and Store Data). So by YOUR definition Haswell can only retire 8 uops/clock PEAK not sustained but Skylake can retire 16 uops/clock PEAK not sustained.
> Stubabe (Stubabe.delete@this.nospam.com) on October 7, 2015 1:17 am wrote:
>
> > So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> > only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> > clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> > x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
> >
>
> Since this thread is spinning towards another useless terminology/convention
> fight. I will just add this relevant quote and move on:
>
>
A processor such as Core i*4/i*5 Haswell/Broadwell, for example, can decode up to 5 x86 instructions
> per cycle, producing a maximum of up to 4 fused μops per cycle, which are then stored in an L0 μop
> cache, from which up to 4 fused μops per cycle are fetched, then register-renamed and placed into a
> reorder buffer, from which up to 8 un-fused individual μops are issued per cycle to the functional units,
> where they proceed down the various pipelines until they complete, whereupon up to 4 fused μops per
> cycle can be committed and retired. So what does that make the width of Haswell/Broadwell? It's really
> an 8-issue processor at heart, since up to 8 un-fused μops can be fetched, issued and completed per
> cycle if they're paired/fused in just the right way (and an un-fused μop is the most direct equivalent
> of a simple RISC instruction), but even experts disagree on exactly what to call the width of such a
> design, since 4-issue would also be valid, in terms of fused μops, which is what the processor mostly
> "thinks in terms of" for tracking purposes, and 5-issue is also valid if thinking in terms of original
> x86 instructions. Of course, this width-labelling conundrum is largely academic, since no processor
> is likely to actually sustain such high levels of ILP when running real-world code anyway.
>
This isn't an issue of different terminology but your inconsistent terminology.
If you want to use the definition YOU used to describe Haswell as 8-wide then by YOUR standard Skylake can retire 16 uops, which is unrealistic and that's why I suggested its a silly terminology to use. But, you simply cannot claim by ANY definition that the retirement bandwidth of Haswell = Skylake.
There is no possible code sequence that can produce the right mixture of fused uops to sustain 8 issue in Haswell since that would require 4 load units as there are no fused exec-store ops and two units are dedicated to the store path (Store Address and Store Data). So by YOUR definition Haswell can only retire 8 uops/clock PEAK not sustained but Skylake can retire 16 uops/clock PEAK not sustained.