ICL memory renaming?

By: anonlitmus (anon.delete@this.litmus.org), August 4, 2019 3:20 pm
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on August 4, 2019 2:55 pm wrote:
> anonlitmus (anon.delete@this.litmus.org) on August 3, 2019 7:06 am wrote:
> > 10-20 cycles seems too much for something that is a clear-cut Store-to-Load forwarding
> > case. In steady state it should be more or less similar to L1D hit, shouldn't it (see
> > some oldish data at http://blog.stuffedcow.net/2014/01/x86-memory-disambiguation/)?
> >
> > Also, any idea why the 16-bit version would be slower than the 8/32/64-bit version?
> > I am guessing that it is the partial register semantics that makes it not memory
> > rename-able but then the 8-bit case should have the same issue, right?
>
> Browse through static assembly or a dynamic trace for a typical program and you'll probably find that
> 16-bit instructions are considerably less common than other sizes: 32-bit and 64-bit are obviously common
> and 8-bit is still very common too, due to the prevalence of char/byte use the setcc instructions operating
> on bytes, and because some types of larger width code ends up using byte operations.
>
> If you scan the instruction latency you'll find a few cases where 16-bit operands are disadvantaged: an extra
> uop for most multiplies, movsx, lea and a few others. Other cases don't take any extra uop but end up much
> slower (at least in a tight loop) because of length changing prefixes which stall the front end.
>
> All that to say that 16-bit operands are the least importan size, by far, and perhaps Intel
> was able to simply the implementation or dedicate more resources to the other sizes by leaving
> them out. Or perhaps the 16-bit code just didn't trigger the optimization for some reason.

Well, sure, you are correct. My point here is just that if the claim is that there is some memory renaming going on and it is implemented through the rename table like move elimination then I would argue that it appears equally hard to do it for 8 and for 16-bit registers just by virtue of x86 partial register semantics. So I don't really see how the 8-bit version would benefit from it if the 16-bit version can't, that's all.

Say I store al to [X], then load [X] into bl. Can I shortcircuit it through the rename map? Not really because the physical register that is mapped to al may actually be mapped to ax and also have ah, and I do not want bh to have the same value as ah. Correct me if that is not legit x86 semantics, I might have misunderstood.

I suppose they might have a way to identify the cases where the size of the logical producer is the same as the logical consumer (that might already be required to detect partial writes and do the merges anyway). But then, 16-bit just sounds free if you are going to do 8-bit.




< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ICL memory renaming?Travis Downs2019/08/02 06:43 PM
  ICL memory renaming?anonlitmus2019/08/03 07:06 AM
    ICL memory renaming?Travis Downs2019/08/04 02:55 PM
      ICL memory renaming?anonlitmus2019/08/04 03:20 PM
        ICL memory renaming?Travis Downs2019/08/04 04:51 PM
  ICL memory renaming?ll2019/08/03 08:37 PM
    ICL memory renaming?Montaray Jack2019/08/03 11:53 PM
      ICL memory renaming?Montaray Jack2019/08/04 12:44 AM
    ICL memory renaming?Travis Downs2019/08/04 03:00 PM
      ICL memory renaming?ll2019/08/05 06:05 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell purple?