By: Patrick Chase (patrickjchase.delete@this.gmail.com), February 4, 2013 12:02 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 12:48 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > [snip]
> > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > to reservation stations), though, because otherwise the uop size is unmanageable.
> >
> > Why would µop size be any different based on the form of renaming?
>
> In a classic reservation station based OoO design the uop (the thing that
> is sent from the issue unit to the RS after renaming) contains either:
>
> 1. The literal value of each input operand if it is available at the time of issue
>
> 2. The identity of the reservation station that will produce the input operand, if
> it is unavailable at the time of issue. The reservation station used this information
> to "capture" the operand when it is subsequently broadcast to the common result bus
> (and that's why the result bus is a power-intensive part of a Tomasulo machine).
>
> That makes for a rather large uop. In contrast in the PRF style the uop contains the
> ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> I'd bet they're considerably smaller than the original P6's reported 118 bits.
>
> Even in the PRF case there has to be some "fixup" of the uop coming out of the I-cache (for example
> the physical register IDs must change based on the state of the IRAT), so it's not possible to cache
> "pure" uops. It would presumably be possible to define an intermediate "predecoded" format that
> could be used in an RS-based design. For some reason I don't know Intel never did so, though. They've
> very consistently used uop caches in the PRF designs but not in their RS designs.
>
> -- Patrick
>
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 4, 2013 12:20 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 10:05 am wrote:
> > [snip]
> > > You can mitigate such decode power penalties by using a first-level Icache that contains
> > > uops instead of instructions. It only works if you're using physical register files (as opposed
> > > to reservation stations), though, because otherwise the uop size is unmanageable.
> >
> > Why would µop size be any different based on the form of renaming?
>
> In a classic reservation station based OoO design the uop (the thing that
> is sent from the issue unit to the RS after renaming) contains either:
>
> 1. The literal value of each input operand if it is available at the time of issue
>
> 2. The identity of the reservation station that will produce the input operand, if
> it is unavailable at the time of issue. The reservation station used this information
> to "capture" the operand when it is subsequently broadcast to the common result bus
> (and that's why the result bus is a power-intensive part of a Tomasulo machine).
>
> That makes for a rather large uop. In contrast in the PRF style the uop contains the
> ID of the physical register. I don't know the uop sizes for P4 or "the bridges", but
> I'd bet they're considerably smaller than the original P6's reported 118 bits.
>
> Even in the PRF case there has to be some "fixup" of the uop coming out of the I-cache (for example
> the physical register IDs must change based on the state of the IRAT), so it's not possible to cache
> "pure" uops. It would presumably be possible to define an intermediate "predecoded" format that
> could be used in an RS-based design. For some reason I don't know Intel never did so, though. They've
> very consistently used uop caches in the PRF designs but not in their RS designs.
>
> -- Patrick
>