By: Michael S (already5chosen.delete@this.yahoo.com), November 13, 2014 6:49 am
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on November 13, 2014 6:19 am wrote:
> anon (anon.delete@this.anon.com) on November 12, 2014 8:37 pm wrote:
>
> > > Intel's previous processors, P-Pro/P-II/P-III, had a moderately wide and somewhat fragile
> > > decoding logic: 3 decoders, of which only two could only decode simple instructions.
> >
> > Actually I think 3 could decode simple. Only 1 could decode complex, and microcode went to another path.
>
> You are correct, I worded it poorly.
>
> >
> > But IPC of Pentium4 was 2/3 IPC of PentiumIII, so a 2-wide decoder should have been sufficient.
>
> Err...
> P6 could rename 3 µOP per cycle and issue 5.
> Pentium4 count rename 3 µOP per cycle and issue 4.
>
It has 4 issue ports, yes. But at peak it can issue 6 uOps per cycle, because two out of 4 ports are double-pumped.
> It was narrower.
In fact, except for FPU it was not narrower.
That, IMHO, was one of its biggest problems - too wide for speed racer, which led to power wall at much lower frequency than what would be possible with narrower design.
> but not that much. With a 2-wide decoder, it would
> still be possible to have decode limited critical loops in Pentium4.
>
> >
> > > And a lot of critical loops would become decode limited if not properly scheduled.
> > >
> > > Netburst's trace cache was meant as a way to bypass these issues.
> > > In theory, it would provide robust high bandwidth instruction
> > > fetch without the need for wide generic x86 decoding logic.
> > > In practice, if didn't work out so well.
> > >
> > > Only after Netburst Intel began improving x86 decoding, with the introduction of µOP fusion in the Banias.
> >
> > I'm not sure what you mean. x86 decoding has been improved
> > in every generation of Intel microarchitectures before P4.
>
> If by that you mean that the P6 had better decoding than the P5, then yes.
>
> But what I meant is that all the early processors in the P6 µarch family, from the
> Pentium Pro to the Pentium III, had the same decoding restrictions (4-1-1 rule).
> Only with Banias, the P6 derivatives got better decoding.
> anon (anon.delete@this.anon.com) on November 12, 2014 8:37 pm wrote:
>
> > > Intel's previous processors, P-Pro/P-II/P-III, had a moderately wide and somewhat fragile
> > > decoding logic: 3 decoders, of which only two could only decode simple instructions.
> >
> > Actually I think 3 could decode simple. Only 1 could decode complex, and microcode went to another path.
>
> You are correct, I worded it poorly.
>
> >
> > But IPC of Pentium4 was 2/3 IPC of PentiumIII, so a 2-wide decoder should have been sufficient.
>
> Err...
> P6 could rename 3 µOP per cycle and issue 5.
> Pentium4 count rename 3 µOP per cycle and issue 4.
>
It has 4 issue ports, yes. But at peak it can issue 6 uOps per cycle, because two out of 4 ports are double-pumped.
> It was narrower.
In fact, except for FPU it was not narrower.
That, IMHO, was one of its biggest problems - too wide for speed racer, which led to power wall at much lower frequency than what would be possible with narrower design.
> but not that much. With a 2-wide decoder, it would
> still be possible to have decode limited critical loops in Pentium4.
>
> >
> > > And a lot of critical loops would become decode limited if not properly scheduled.
> > >
> > > Netburst's trace cache was meant as a way to bypass these issues.
> > > In theory, it would provide robust high bandwidth instruction
> > > fetch without the need for wide generic x86 decoding logic.
> > > In practice, if didn't work out so well.
> > >
> > > Only after Netburst Intel began improving x86 decoding, with the introduction of µOP fusion in the Banias.
> >
> > I'm not sure what you mean. x86 decoding has been improved
> > in every generation of Intel microarchitectures before P4.
>
> If by that you mean that the P6 had better decoding than the P5, then yes.
>
> But what I meant is that all the early processors in the P6 µarch family, from the
> Pentium Pro to the Pentium III, had the same decoding restrictions (4-1-1 rule).
> Only with Banias, the P6 derivatives got better decoding.