By: TimMc (timcaffrey.delete@this.aol.com), September 21, 2022 3:46 pm
Room: Moderated Discussions
TimMc (timcaffrey.delete@this.aol.com) on September 20, 2022 8:56 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on September 20, 2022 1:44 pm wrote:
> > I'm trying to understand the reasoning behind decision of designers of System-V
> > x86-64 ABI to have no callee-save vector (or floating-point) registers.
> > So far, I have no idea. With total of 16 XMM registers defining at least 4, but more
> > likely 6 to 8 registers as callee-save sounds to me as obviously sane thing to do.
> > What am i missing?
> >
> 1) Callee-save means that if the function needs those registers, it has to save them.
> 2) Most functions don't use floating point or SSE.
>
> I suspect that, unlike general purpose registers, there is no good performance advantage
> to callee-saved SSE registers. For instance, you might save a loop counter in a callee-saved
> register when you have a function call in a loop. For floating point you may need to save a
> partial result of a calculation (e.g. X = sin(y)**2 + cos(z)**2), but it is probably (usually)
> more efficient for the caller to save off what it needs than the callee.
>
> Just guessing, I have not done the analysis.
>
>
Just to add to this: The original AMD64 only specified SSE2. i.e. only XMM registers.
So, how do you handle YMM (AVX/AVX2) and ZMM (AVX512)? The original ABI would have only
saved the XMM portion of the registers, so it would have been worse than useless.
(I don't know if AMD knew this would happen, or guessed it would, or they were just
plain lucky. But it is a good thing no XMM registers are callee saved).
> Michael S (already5chosen.delete@this.yahoo.com) on September 20, 2022 1:44 pm wrote:
> > I'm trying to understand the reasoning behind decision of designers of System-V
> > x86-64 ABI to have no callee-save vector (or floating-point) registers.
> > So far, I have no idea. With total of 16 XMM registers defining at least 4, but more
> > likely 6 to 8 registers as callee-save sounds to me as obviously sane thing to do.
> > What am i missing?
> >
> 1) Callee-save means that if the function needs those registers, it has to save them.
> 2) Most functions don't use floating point or SSE.
>
> I suspect that, unlike general purpose registers, there is no good performance advantage
> to callee-saved SSE registers. For instance, you might save a loop counter in a callee-saved
> register when you have a function call in a loop. For floating point you may need to save a
> partial result of a calculation (e.g. X = sin(y)**2 + cos(z)**2), but it is probably (usually)
> more efficient for the caller to save off what it needs than the callee.
>
> Just guessing, I have not done the analysis.
>
>
Just to add to this: The original AMD64 only specified SSE2. i.e. only XMM registers.
So, how do you handle YMM (AVX/AVX2) and ZMM (AVX512)? The original ABI would have only
saved the XMM portion of the registers, so it would have been worse than useless.
(I don't know if AMD knew this would happen, or guessed it would, or they were just
plain lucky. But it is a good thing no XMM registers are callee saved).