Barcelona optimization guide

By: Rob Thorpe (rthorpe.delete@this.realworldtech.com), May 11, 2007 8:22 am
Room: Moderated Discussions
Linus Torvalds (torvalds@osdl.org) on 5/10/07 wrote:
---------------------------
>Rob Thorpe (rthorpe@realworldtech.com) on 5/10/07 wrote:
>>
>>Still it's not that much of a problem, a code generator
>>should be able to minimize the effect.
>
>Sure, but the world would be a better place if you didn't
>have to make uarch-specific optimizations, and could just
>get code that works well for everybody. It looks like the
>optimizations for Core 2 and Barcelona are generally going
>to work well for both, though.

Yes.


>[ Using push/pop in prologues/epilogues ]
>>Are you sure that doing it that way has a significant
>>effect on x86-64 code? Generally what's done is that the
>>stack pointer is adjusted once then the local variables
>>are MOVed into place.
>
>The optimization explicitly says that using push is now
>preferable (and that's a change wrt previous AMD rules).
>
>And yes, the size difference can be quite noticeable. A
>push is a single-byte op (two for the regs that need REX),
>while a "mov to stack" is 5 bytes or more.
>
>It adds up. When every single function tends to do several
>of these things both in the entry and exit path. And since
>you often have multiple epilogues, it also makes things
>like short conditional branches less effective.
>
>Is it "significant"? I don't think it's a huge issue on its
>own, but it's a part of making code denser and getting
>better I$ behaviour. I may be crazy, but I think I$ density
>matters.

I think that icache density behaviour matters too, though I think its only one lever to get good performance overall.

What I mean about the stack is this....
There are 15 GPRS in x86-64, that is as many as there are in some RISCs. It allows a lot of local variables to be placed in GPRS, reducing the need for stack slots to begin with. Also, the GPRS are of two sorts, normal ones and those that need the REX prefix. A sensible register allocator will allocate the normal ones to the variables used most frequently, in order to reduce the uses of REX prefixes and increase icache density. So, variables that are important inside loops will often be RAX, RBX, RCX etc. Those outside of loops will use R8-R15 etc. Those still less significant will use stack slots.

So, under the circumstances where normal variables are important only functions that use a large number of variables will get significantly bigger.

The other class of functions that may be affected are those that use pointer extensively. If a reference is made to something within a function then it must not be a register.

I suppose these two cases will occur in maybe 70% of function, but not I would expect 70% of variables within functions. So the gain seems to me to be likely to be quite small.


>There appears to be something magical about the single-byte
>"ret" form, because the manual states that you can avoid
>the problem with the 3-byte "ret $0" form. But yeah, that
>one would probably have two stages of stack op logic,
>so it could well be about that..

That is indeed weird.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Barcelona optimization guidemas2007/05/10 06:43 AM
  Barcelona optimization guideLinus Torvalds2007/05/10 09:00 AM
    Barcelona optimization guideRob Thorpe2007/05/10 09:23 AM
      Barcelona optimization guideLinus Torvalds2007/05/10 09:42 AM
        Barcelona optimization guideRob Thorpe2007/05/11 08:22 AM
          Barcelona optimization guideDavid Kanter2007/05/11 04:17 PM
            Barcelona optimization guideLinus Torvalds2007/05/11 04:30 PM
            Barcelona optimization guideanonymous2007/05/11 10:29 PM
              Barcelona optimization guideanonymous2007/05/12 06:47 AM
              Barcelona optimization guidehobold2007/05/14 04:30 AM
        Barcelona optimization guideAndreas Kaiser2007/05/12 08:32 AM
  Barcelona optimization guideVincent Diepeveen2007/05/13 04:20 AM
    Barcelona optimization guideEduardoS2007/05/13 06:01 AM
      Barcelona optimization guideVincent Diepeveen2007/05/13 08:18 AM
        Barcelona optimization guideMichael S2007/05/13 09:03 AM
        Barcelona optimization guideEduardoS2007/05/13 09:30 AM
        Barcelona optimization guideDresdenboy2007/05/14 07:18 AM
          Barcelona optimization guideVincent Diepeveen2007/05/16 01:36 AM
            Barcelona optimization guideEduardoS2007/05/16 05:57 AM
              Barcelona optimization guideVincent Diepeveen2007/05/16 08:51 AM
        Barcelona optimization guideDavid Kanter2007/05/16 03:13 AM
          Barcelona vs Core2 Vincent Diepeveen2007/05/16 05:35 AM
            Barcelona vs Core2 David Kanter2007/05/16 11:06 AM
            Barcelona vs Core2 EduardoS2007/05/16 11:41 AM
              Barcelona vs Core2 David Kanter2007/05/16 11:53 AM
                Barcelona vs Core2 EduardoS2007/05/16 12:37 PM
                  Barcelona vs Core2 David Kanter2007/05/16 01:43 PM
                    Barcelona vs Core2 EduardoS2007/05/16 03:32 PM
                    Barcelona vs Core2 Gabriele Svelto2007/05/17 05:38 AM
          Barcelona optimization guideanonymous2007/05/16 07:13 PM
            Barcelona optimization guideMichael S2007/05/17 04:26 AM
              Barcelona optimization guideanonymous2007/05/17 05:23 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?