TIL: simple vs complex addressing is resolved at rename time (probably)

By: foobar (foobar.delete@this.foobar.foobar), August 4, 2018 7:00 am
Room: Moderated Discussions
anon (spam.delete.delete@this.this.spam.com) on August 4, 2018 5:05 am wrote:
> foobar (foobar.delete@this.foobar.foobar) on August 4, 2018 1:40 am wrote:
> > Travis (travis.downs.delete@this.gmail.com) on August 3, 2018 1:34 pm wrote:
> > > What I learned yesterday that the distinction between simple and complex addressing apparently
> > > happens dynamically after decode. In particular, something like [rdx + rsi*4] looks like
> > > complex addressing, but if rsi is zero at runtime and was set to zero by a zeroing idiom
> > > the latency is as-if you were using simple addressing (4 cycles for GP loads).
> >
> > This made me wonder: would the same be possible for conditional branches? That is, if you would use
> > the zero idiom on a register and perform a macro-fused compare-and-branch operation not involving
> > other registers on it, the executed uop would actually be an unconditional jump, or even a nop if
> > the conditional branch would not be taken? I guess this would be visible mostly through the branch
> > predictor since predicted taken branches have the same performance as unconditional jumps...
>
> It would be difficult to implement since the zeroing idiom will only be recognized at the
> rename stage and at best at the decoders (if the instructions were adjacent) whereas the
> fetch stage and therefore the branch predictor are running ahead of even the decoders.
>
> The zero idiom would disappear anyway and a branch has to execute to preserve the program. In theory you could
> save a uop in the unconditional not-taken case but unless that always happens (why is that even in the program?)
> some check still needs to be performed and you'd rather not move that into the rename or decode stage.
> If you've already got macro-fusion there is no real improvement in the always taken case. Fused uop
> that compares with the zero register or unfused uop for an unconditional jump shouldn't matter.
>
> So theoretically possible, but practically useless because you still need the branch predictor anyway.

I admit this would be a pretty esoteric case. Nonetheless, there's a theoretical scenario: a function which is littered with conditional blocks executed or not executed according to a flag value. If the theoretical construct would be implemented on the CPU, one could avoid both use and "pollution" of the branch predictor in many cases by zeroing the flag using the zero idiom ahead of calling this code. Mispredictions wouldn't occur on branches related to this flag, and probably all predicted variants would strongly predict the non-zero condition. (Of course, this wouldn't be very practical in the sense there is no "one idiom" for traditional Intel registers, which would make this behaviour quite asymmetric.)

Yeah, it's a wild, and probably a stupid idea. Probably quite nonsensical and untrivial to implement. Another similarly baffling idea is to macro-fuse an arithmetic operation with a conditional branch which is always taken (thanks to reasoning on invariants of flags result on the operation). This would allow performing four ALU ops on a cycle *and* an unconditional jump. There the "why" question would of course be related to the fact that typical hot code probably wouldn't have that unconditional jump in the first place...
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
TIL: simple vs complex addressing is resolved at rename time (probably)Travis2018/08/03 01:34 PM
  TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 01:40 AM
    TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 05:05 AM
      TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 07:00 AM
        TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 08:32 AM
          TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 09:48 AM
            TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 10:19 AM
  Data-dependent instruction latencyPeter E. Fry2018/08/04 07:14 AM
    ... or a compiler optimizing aggressively?Heikki Kultala2018/08/04 08:13 AM
      ... or a compiler optimizing aggressively?Peter E. Fry2018/08/04 08:53 AM
    Data-dependent instruction latencyTravis2018/08/04 03:33 PM
      Data-dependent instruction latencyPeter E. Fry2018/08/05 09:13 AM
        Data-dependent instruction latencyTravis2018/08/05 04:55 PM
          Data-dependent instruction latencyPeter E. Fry2018/08/06 07:34 AM
            Data-dependent instruction latencyTravis2018/08/06 05:10 PM
              Data-dependent instruction latencyPeter E. Fry2018/08/07 07:09 AM
                Data-dependent instruction latencyPeter E. Fry2018/08/07 07:11 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?