By: Patrick Chase (no.delete@this.spam-please.com), November 21, 2012 11:52 am
Room: Moderated Discussions
Stubabe (nospam.delete@this.nospam.com) on November 14, 2012 1:43 pm wrote:
> Surely it's just handled by the Register Alias Table?
>
> i.e.
> if PR100 (physical register) holds R9, PR90 holds R15 and PR101 is the next free reg
>
> Without MOV elimination
> -----------------------
> MOV R10 R10 would allocate PR101 and R9 renames to PR100
> ADD R10, R10 R15
> R10 would allocate PR102, R10 renames to PR101 and R15 to PR90
>
> so the sequence becomes:
> uMOV PR101, PR100
> uADD PR102, PR101, PR90
>
> With MOV elimination
> --------------------
> MOV R10 R10 would now rename to PR100 as well as R9
> ADD R10, R10 R15
> R10 would allocate PR101, R10 renames to PR100 and R15 to PR90
>
> so the sequence becomes:
> uADD PR101, PR100, PR90
>
> This way the is no need to fuse adjacent instructions, R10 can
> have multiple dependences and the instruction can get NOPed
This seems very likely, given that Intel did precisely the same thing in the P6.
Recall that in Pentium FXCH was made "nearly free", by allowing an FXCH and a dependent FP op to issue simultaneously down the U and V pipes. When Intel went to design the P6 they didn't want to regress on P5-optimized FP codes. That meant that they couldn't make the FP uop dependent on a separate FXCH uop, because doing so would have reduced FP issue rate by ~50%. What they did instead was to handle the FXCH by swapping entries in the RAT as you describe.
I had some Email discussions with Bob Colwell (chief P6 architect) at the time, and he pointed out that while the optimization is conceptually simple, it has some "interesting" impacts on things like exception handling. Recall that the entire point of having an ROB is to enable precise exceptions, so "hiding" operations from the ROB is done at one's own peril.
-- Patrick
> Surely it's just handled by the Register Alias Table?
>
> i.e.
> if PR100 (physical register) holds R9, PR90 holds R15 and PR101 is the next free reg
>
> Without MOV elimination
> -----------------------
> MOV R10 R10 would allocate PR101 and R9 renames to PR100
> ADD R10, R10 R15
> R10 would allocate PR102, R10 renames to PR101 and R15 to PR90
>
> so the sequence becomes:
> uMOV PR101, PR100
> uADD PR102, PR101, PR90
>
> With MOV elimination
> --------------------
> MOV R10 R10 would now rename to PR100 as well as R9
> ADD R10, R10 R15
> R10 would allocate PR101, R10 renames to PR100 and R15 to PR90
>
> so the sequence becomes:
> uADD PR101, PR100, PR90
>
> This way the is no need to fuse adjacent instructions, R10 can
> have multiple dependences and the instruction can get NOPed
This seems very likely, given that Intel did precisely the same thing in the P6.
Recall that in Pentium FXCH was made "nearly free", by allowing an FXCH and a dependent FP op to issue simultaneously down the U and V pipes. When Intel went to design the P6 they didn't want to regress on P5-optimized FP codes. That meant that they couldn't make the FP uop dependent on a separate FXCH uop, because doing so would have reduced FP issue rate by ~50%. What they did instead was to handle the FXCH by swapping entries in the RAT as you describe.
I had some Email discussions with Bob Colwell (chief P6 architect) at the time, and he pointed out that while the optimization is conceptually simple, it has some "interesting" impacts on things like exception handling. Recall that the entire point of having an ROB is to enable precise exceptions, so "hiding" operations from the ROB is done at one's own peril.
-- Patrick



