By: juanrga (nospam.delete@this.juanrga.com), October 7, 2015 4:33 pm
Room: Moderated Discussions
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on October 7, 2015 3:07 pm wrote:
> juanrga (nospam.delete@this.juanrga.com) on October 7, 2015 11:56 am wrote:
> > Stubabe (Stubabe.delete@this.nospam.com) on October 7, 2015 1:17 am wrote:
> >
> > > So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> > > only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> > > clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> > > x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
> > >
> >
> > Since this thread is spinning towards another useless terminology/convention
> > fight. I will just add this relevant quote and move on:
> >
> >
>
> Width is in terms of instructions, not in terms of micro ops. Cortex-A57 has 8-wide uop issue/execute
> as well.
Once again, the width of a core depends of the convention used to define the width. Many people uses micro ops to define the width and that is why we claim that Cyclone is 6-wide [1], Denver is 7-wide [2], Haswell/Broadwell is 8-wide [3] and Power-8 is 10-wide (if my memory doesn't fail).
> The claim that an unfused x86 micro-op is similar to a RISC instruction is funny
> - STP is a single micro-op in A57 while Haswell/Broadwell require 4 micro-ops to do the same... Comparing micro ops between different implementations is not useful.
What he really means is that uops are RISC-like unlike the original x86 instructions that are CISC. He is not saying that all RISC or RISC-like instruction are equivalent. E.g. you can need 100 MIPS instructions to do the same computational work required by 70 RISC-V instructions. (Here 100 and 70 are used for illustrative purposes only).
None of the definitions at use is perfect. Using decoder width also generate problems like the already mentioned case of overprovisioned designs or that decoding 3 ARM instructions is, in general, not the same than decoding 3 x86 instructions.
[1] http://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed
[2] Check slide #5 of Hot chips presentation talk about Denver, where says "7-wide".
[3] Check above quote from Carey Patterson.
> juanrga (nospam.delete@this.juanrga.com) on October 7, 2015 11:56 am wrote:
> > Stubabe (Stubabe.delete@this.nospam.com) on October 7, 2015 1:17 am wrote:
> >
> > > So if YOU are claiming Haswell is a 8-wide design by counting fused uops as 2 (and I think you are the
> > > only one here that is) then Skylake is 12wide allocate with 16 wide retire. However, these values are
> > > clearly unrealistic not least because Haswell cannot sustain 8-issue under any possible non-microcoded
> > > x86 code sequence. So a more reasonable description is Haswell is 4 wide and Skylake is 6 wide.
> > >
> >
> > Since this thread is spinning towards another useless terminology/convention
> > fight. I will just add this relevant quote and move on:
> >
> >
A processor such as Core i*4/i*5 Haswell/Broadwell, for example, can decode up to 5 x86 instructions
> > per cycle, producing a maximum of up to 4 fused μops per cycle, which are then stored in an L0 μop
> > cache, from which up to 4 fused μops per cycle are fetched, then register-renamed and placed into a
> > reorder buffer, from which up to 8 un-fused individual μops are issued per cycle to the functional units,
> > where they proceed down the various pipelines until they complete, whereupon up to 4 fused μops per
> > cycle can be committed and retired. So what does that make the width of Haswell/Broadwell? It's really
> > an 8-issue processor at heart, since up to 8 un-fused μops can be fetched, issued and completed per
> > cycle if they're paired/fused in just the right way (and an un-fused μop is the most direct equivalent
> > of a simple RISC instruction), but even experts disagree on exactly what to call the width of such a
> > design, since 4-issue would also be valid, in terms of fused μops, which is what the processor mostly
> > "thinks in terms of" for tracking purposes, and 5-issue is also valid if thinking in terms of original
> > x86 instructions. Of course, this width-labelling conundrum is largely academic, since no processor
> > is likely to actually sustain such high levels of ILP when running real-world code anyway.
>
> Width is in terms of instructions, not in terms of micro ops. Cortex-A57 has 8-wide uop issue/execute
> as well.
Once again, the width of a core depends of the convention used to define the width. Many people uses micro ops to define the width and that is why we claim that Cyclone is 6-wide [1], Denver is 7-wide [2], Haswell/Broadwell is 8-wide [3] and Power-8 is 10-wide (if my memory doesn't fail).
> The claim that an unfused x86 micro-op is similar to a RISC instruction is funny
> - STP is a single micro-op in A57 while Haswell/Broadwell require 4 micro-ops to do the same... Comparing micro ops between different implementations is not useful.
What he really means is that uops are RISC-like unlike the original x86 instructions that are CISC. He is not saying that all RISC or RISC-like instruction are equivalent. E.g. you can need 100 MIPS instructions to do the same computational work required by 70 RISC-V instructions. (Here 100 and 70 are used for illustrative purposes only).
None of the definitions at use is perfect. Using decoder width also generate problems like the already mentioned case of overprovisioned designs or that decoding 3 ARM instructions is, in general, not the same than decoding 3 x86 instructions.
[1] http://www.anandtech.com/show/7910/apples-cyclone-microarchitecture-detailed
[2] Check slide #5 of Hot chips presentation talk about Denver, where says "7-wide".
[3] Check above quote from Carey Patterson.