By: anon (anon.delete@this.anon.com), February 4, 2013 5:45 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 1:08 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 12:48 pm wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > > [snip]
> > > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > > to reservation stations), though, because otherwise the uop size is unmanageable.
> > >
> > > Why would µop size be any different based on the form of renaming?
> >
> > In a classic reservation station based OoO design the uop (the thing that
> > is sent from the issue unit to the RS after renaming) contains either:
> >
> > 1. The literal value of each input operand if it is available at the time of issue
> >
> > 2. The identity of the reservation station that will produce the input operand, if
> > it is unavailable at the time of issue. The reservation station used this information
> > to "capture" the operand when it is subsequently broadcast to the common result bus
> > (and that's why the result bus is a power-intensive part of a Tomasulo machine).
> >
> > That makes for a rather large uop. In contrast in the PRF style the uop contains the
> > ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> > I'd bet they're considerably smaller than the original P6's reported 118 bits.
The uop decode cache would not store any such input values or pointers, surely.
>
> Execpt of course that the P6 wasn't a "classic" RS design. The uops/RS entries point
> to either the architectural register file or to ROB entries rather than holding literal/virtual
> values as in, say, the 360/91 FP unit. Results on the common bus are tagged with ROB
> entry number instead of RS slot ID. Should have looked it up first...
>
> Given that, I'm don't know why Intel doesn't do pseudo-uop caches in their
> RS-based designs. Does anybody out there who can talk happen to know?
Possibly just coincidence.
The Core2/NHM/WM architecture has a form of uop cache (the loop buffer).
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 12:48 pm wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > > [snip]
> > > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > > to reservation stations), though, because otherwise the uop size is unmanageable.
> > >
> > > Why would µop size be any different based on the form of renaming?
> >
> > In a classic reservation station based OoO design the uop (the thing that
> > is sent from the issue unit to the RS after renaming) contains either:
> >
> > 1. The literal value of each input operand if it is available at the time of issue
> >
> > 2. The identity of the reservation station that will produce the input operand, if
> > it is unavailable at the time of issue. The reservation station used this information
> > to "capture" the operand when it is subsequently broadcast to the common result bus
> > (and that's why the result bus is a power-intensive part of a Tomasulo machine).
> >
> > That makes for a rather large uop. In contrast in the PRF style the uop contains the
> > ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> > I'd bet they're considerably smaller than the original P6's reported 118 bits.
The uop decode cache would not store any such input values or pointers, surely.
>
> Execpt of course that the P6 wasn't a "classic" RS design. The uops/RS entries point
> to either the architectural register file or to ROB entries rather than holding literal/virtual
> values as in, say, the 360/91 FP unit. Results on the common bus are tagged with ROB
> entry number instead of RS slot ID. Should have looked it up first...
>
> Given that, I'm don't know why Intel doesn't do pseudo-uop caches in their
> RS-based designs. Does anybody out there who can talk happen to know?
Possibly just coincidence.
The Core2/NHM/WM architecture has a form of uop cache (the loop buffer).