By: juanrga (nospam.delete@this.juanrga.com), October 7, 2015 10:56 am
Room: Moderated Discussions
Stubabe (Stubabe.delete@this.nospam.com) on October 7, 2015 1:17 am wrote:
> So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
>
Since this thread is spinning towards another useless terminology/convention fight. I will just add this relevant quote and move on:
> So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
>
Since this thread is spinning towards another useless terminology/convention fight. I will just add this relevant quote and move on:
A processor such as Core i*4/i*5 Haswell/Broadwell, for example, can decode up to 5 x86 instructions per cycle, producing a maximum of up to 4 fused μops per cycle, which are then stored in an L0 μop cache, from which up to 4 fused μops per cycle are fetched, then register-renamed and placed into a reorder buffer, from which up to 8 un-fused individual μops are issued per cycle to the functional units, where they proceed down the various pipelines until they complete, whereupon up to 4 fused μops per cycle can be committed and retired. So what does that make the width of Haswell/Broadwell? It's really an 8-issue processor at heart, since up to 8 un-fused μops can be fetched, issued and completed per cycle if they're paired/fused in just the right way (and an un-fused μop is the most direct equivalent of a simple RISC instruction), but even experts disagree on exactly what to call the width of such a design, since 4-issue would also be valid, in terms of fused μops, which is what the processor mostly "thinks in terms of" for tracking purposes, and 5-issue is also valid if thinking in terms of original x86 instructions. Of course, this width-labelling conundrum is largely academic, since no processor is likely to actually sustain such high levels of ILP when running real-world code anyway.