By: Patrick Chase (patrickjchase.delete@this.gmail.com), February 4, 2013 1:08 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 12:48 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > [snip]
> > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > to reservation stations), though, because otherwise the uop size is unmanageable.
> >
> > Why would µop size be any different based on the form of renaming?
>
> In a classic reservation station based OoO design the uop (the thing that
> is sent from the issue unit to the RS after renaming) contains either:
>
> 1. The literal value of each input operand if it is available at the time of issue
>
> 2. The identity of the reservation station that will produce the input operand, if
> it is unavailable at the time of issue. The reservation station used this information
> to "capture" the operand when it is subsequently broadcast to the common result bus
> (and that's why the result bus is a power-intensive part of a Tomasulo machine).
>
> That makes for a rather large uop. In contrast in the PRF style the uop contains the
> ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> I'd bet they're considerably smaller than the original P6's reported 118 bits.
Execpt of course that the P6 wasn't a "classic" RS design. The uops/RS entries point to either the architectural register file or to ROB entries rather than holding literal/virtual values as in, say, the 360/91 FP unit. Results on the common bus are tagged with ROB entry number instead of RS slot ID. Should have looked it up first...
Given that, I'm don't know why Intel doesn't do pseudo-uop caches in their RS-based designs. Does anybody out there who can talk happen to know?
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > [snip]
> > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > to reservation stations), though, because otherwise the uop size is unmanageable.
> >
> > Why would µop size be any different based on the form of renaming?
>
> In a classic reservation station based OoO design the uop (the thing that
> is sent from the issue unit to the RS after renaming) contains either:
>
> 1. The literal value of each input operand if it is available at the time of issue
>
> 2. The identity of the reservation station that will produce the input operand, if
> it is unavailable at the time of issue. The reservation station used this information
> to "capture" the operand when it is subsequently broadcast to the common result bus
> (and that's why the result bus is a power-intensive part of a Tomasulo machine).
>
> That makes for a rather large uop. In contrast in the PRF style the uop contains the
> ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> I'd bet they're considerably smaller than the original P6's reported 118 bits.
Execpt of course that the P6 wasn't a "classic" RS design. The uops/RS entries point to either the architectural register file or to ROB entries rather than holding literal/virtual values as in, say, the 360/91 FP unit. Results on the common bus are tagged with ROB entry number instead of RS slot ID. Should have looked it up first...
Given that, I'm don't know why Intel doesn't do pseudo-uop caches in their RS-based designs. Does anybody out there who can talk happen to know?