By: Jouni Osmala (josmala.delete@this.cc.hut.fi), February 5, 2013 7:59 am
Room: Moderated Discussions
Etienne (etienne_lorrain.delete@this.yahoo.fr) on February 5, 2013 6:57 am wrote:
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 4, 2013 11:15 pm wrote:
> > Almost decade ago I defined architecture with instead of mask it counted the skips and
> > nullified. What I figured out during the exercise that register renaming becomes a
> > problematic with this kind of instructions.
> > I think the conditional move instruction brings most of the benefits with far
> > less problems.
>
> I fail to understand the problem, care to provide an example?
When you rename registers, the registers after pipeline are not same as register before register renaming stage. In register renaming the registers are changed from what programmer sees to a (physical or something else in some experiments) before instructions entering OoO core. Now the point of nullification usefulness is at execution stage and in reservation stations, far after register renaming. So you cannot just nullify it, and need to replace it with movement from old value to new value, and that reservation station or execution unit still needs the OLD value as additional dependency and dependency on branch instruction.Now the dependency check in OoO hardware is quite large piece, and its size is suddenly doubled. And that not counting the hardware that needs to be close to there for replacing on a branch the specific locations in the buffer with different instructions.
Now what if there are multiple branches. Remember that there should be 60 instructions in between front of pipeline and the time when first one is executed and sometimes there is even more old instructions in which one instruction waits main memory and everything that is dependent on it waits for it and so on.
if(a) {
blaahh..
if(b){
bluuu
}
}
noob.
Then all the instructions in bluuu are dependent on BOTH A and B branch, and old values of blaah and before branch. And instructions in noob have dependencies in all those.
The conditional move make those dependencies separate instructions, that are explicit and don't require much extra hardware. The branching handles this by guessing and unrolling if any of the guesses where wrong. And that unroll hardware is something that needs to be in the pipeline anyway.
> Why is it incompatible with register renaming? Rename what I call "NOP" with
> a "conditional move" where you know the condition is false and that would
> release the entry in the physical register file.
> The problem of the condition bits of the flag register is that they may have
> an effect a long time after the "test/add" instruction, and that may delay the
> retirement of other instructions - but in practice most condition bits are used
> just few instructions after they are calculated (if they are ever used).
> With the system I describe (I am not a processor designer), you do not get rid
> of the flags register because you still have to store the interrupt enable bit,
> and the NOPify mask in case of exception/interrupt, but you decide which instruction
> to execute and which to NOPify very early, you get rid of status flag dependencies.
> Obviously then the "add with carry" becomes a bit more complex because that will
> involve a possibly NOPified increment, but isn't that simpler than managing bits
> which may or may not have an effect later on?
>
> Etienne.
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 4, 2013 11:15 pm wrote:
> > Almost decade ago I defined architecture with instead of mask it counted the skips and
> > nullified. What I figured out during the exercise that register renaming becomes a
> > problematic with this kind of instructions.
> > I think the conditional move instruction brings most of the benefits with far
> > less problems.
>
> I fail to understand the problem, care to provide an example?
When you rename registers, the registers after pipeline are not same as register before register renaming stage. In register renaming the registers are changed from what programmer sees to a (physical or something else in some experiments) before instructions entering OoO core. Now the point of nullification usefulness is at execution stage and in reservation stations, far after register renaming. So you cannot just nullify it, and need to replace it with movement from old value to new value, and that reservation station or execution unit still needs the OLD value as additional dependency and dependency on branch instruction.Now the dependency check in OoO hardware is quite large piece, and its size is suddenly doubled. And that not counting the hardware that needs to be close to there for replacing on a branch the specific locations in the buffer with different instructions.
Now what if there are multiple branches. Remember that there should be 60 instructions in between front of pipeline and the time when first one is executed and sometimes there is even more old instructions in which one instruction waits main memory and everything that is dependent on it waits for it and so on.
if(a) {
blaahh..
if(b){
bluuu
}
}
noob.
Then all the instructions in bluuu are dependent on BOTH A and B branch, and old values of blaah and before branch. And instructions in noob have dependencies in all those.
The conditional move make those dependencies separate instructions, that are explicit and don't require much extra hardware. The branching handles this by guessing and unrolling if any of the guesses where wrong. And that unroll hardware is something that needs to be in the pipeline anyway.
> Why is it incompatible with register renaming? Rename what I call "NOP" with
> a "conditional move" where you know the condition is false and that would
> release the entry in the physical register file.
> The problem of the condition bits of the flag register is that they may have
> an effect a long time after the "test/add" instruction, and that may delay the
> retirement of other instructions - but in practice most condition bits are used
> just few instructions after they are calculated (if they are ever used).
> With the system I describe (I am not a processor designer), you do not get rid
> of the flags register because you still have to store the interrupt enable bit,
> and the NOPify mask in case of exception/interrupt, but you decide which instruction
> to execute and which to NOPify very early, you get rid of status flag dependencies.
> Obviously then the "add with carry" becomes a bit more complex because that will
> involve a possibly NOPified increment, but isn't that simpler than managing bits
> which may or may not have an effect later on?
>
> Etienne.