By: Felid (Felid.delete@this.mailinator.com), November 15, 2012 2:40 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on November 15, 2012 7:15 am wrote:
> Felid (Felid.delete@this.mailinator.com) on November 15, 2012 12:49 am wrote:
> [snip]
> If the fused operations are adjacent, there can be no additional uses of the mov's destination (given typical
> destructive [source and destination the same] x86 instructions). This, of course, means that preserving a
> register value by moving it to another location that is used much later would not allow this optimization,
> but that practice has been suboptimal for a while since one generally wants to exploit result forwarding.
>
> Even with move elimination in the renamer (which allows more cases to be handled), doing limited
> move elimination in the decoder can be beneficial (especially if one has a µop cache).
This requires double effort: more macrofusion rules and logic for (pre)decoders and more renaming logic for allocator. That's exactly, how it is done in BD (mov+op fusion added in 45 nm K10 and 0-clock moves — in BD). But apparently not in IB.
> Felid (Felid.delete@this.mailinator.com) on November 15, 2012 12:49 am wrote:
> [snip]
> If the fused operations are adjacent, there can be no additional uses of the mov's destination (given typical
> destructive [source and destination the same] x86 instructions). This, of course, means that preserving a
> register value by moving it to another location that is used much later would not allow this optimization,
> but that practice has been suboptimal for a while since one generally wants to exploit result forwarding.
>
> Even with move elimination in the renamer (which allows more cases to be handled), doing limited
> move elimination in the decoder can be beneficial (especially if one has a µop cache).
This requires double effort: more macrofusion rules and logic for (pre)decoders and more renaming logic for allocator. That's exactly, how it is done in BD (mov+op fusion added in 45 nm K10 and 0-clock moves — in BD). But apparently not in IB.



