Article: AMD's Jaguar Microarchitecture
By: Michael S (already5chosen.delete@this.yahoo.com), April 9, 2014 12:12 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on April 8, 2014 6:55 pm wrote:
>
>
> The "restore state on module exit" part of the problem is trickier because the way that is handled today
> does require reading from a general purpose (usually FP) register, with all that implies in terms of having
> to wait until the register is available, etc. What's frustrating is that the ACTUAL use case, the problem
> we are trying to solve, does not require any delay; it's just modeled in a really bad way.
>
> What should be provided is something like:
> the two high performance ops are
> - swap control state to this IMMEDIATE value
> - restore previous control state
> along with one additional register (FPCR').
> Swap state moves FPCR to FPCR' and IMM to FPCR (as I said, at rename time)
> Restore state moves FPCR' back to FPCR
> Apart from these two ops, FPCR and FPCR' are invisible to the ISA.
>
> The idea here is that for all NORMAL uses of FP control, dicking around with "I'll take whatever randomness
> the previous state was, just as long as I get the rounding mode I want" or whatever is nonsense, hence
> this model of read flags, bit twiddle in a new flag or two, write back, is nonsense.
> What you normally want is the two operations I provide --- set known state to what I need, and
> restore state to whatever the guy before me was using. I've no interest in WHAT that previous state
> was, or how it relates to the state I want, so don't bother with ops that allow for that.
>
With properly specified calling convention, it's all much simpler than you suggest.
By "properly specified calling convention" I mean calling convention that always expects a default rounding on enter/exit of normal __cdecl functions. With calling convention like this the need for control flags reads of any sort is rare, which enable simple and slow implementation.
According to my understanding, Microsoft calling conventions for x87 FPU control word register are exactly of that type. See http://msdn.microsoft.com/en-us/library/ms235300.aspx.
The funny (and, probably, sad) thing is that Microsoft calling conventions for SSE FPU control-and-status word are also like that (http://msdn.microsoft.com/en-us/library/yxty7t75.aspx), which effectively means that Microsoft is not serious about detecting exceptional numeric conditions.
> Obviously for context switching we need ops to read/write FPCR and FPCR', but I don't care if they're privileged.
> Just make them the usual sort of "mv special purpose register N to register M" sort of deal.
>
>
> The "restore state on module exit" part of the problem is trickier because the way that is handled today
> does require reading from a general purpose (usually FP) register, with all that implies in terms of having
> to wait until the register is available, etc. What's frustrating is that the ACTUAL use case, the problem
> we are trying to solve, does not require any delay; it's just modeled in a really bad way.
>
> What should be provided is something like:
> the two high performance ops are
> - swap control state to this IMMEDIATE value
> - restore previous control state
> along with one additional register (FPCR').
> Swap state moves FPCR to FPCR' and IMM to FPCR (as I said, at rename time)
> Restore state moves FPCR' back to FPCR
> Apart from these two ops, FPCR and FPCR' are invisible to the ISA.
>
> The idea here is that for all NORMAL uses of FP control, dicking around with "I'll take whatever randomness
> the previous state was, just as long as I get the rounding mode I want" or whatever is nonsense, hence
> this model of read flags, bit twiddle in a new flag or two, write back, is nonsense.
> What you normally want is the two operations I provide --- set known state to what I need, and
> restore state to whatever the guy before me was using. I've no interest in WHAT that previous state
> was, or how it relates to the state I want, so don't bother with ops that allow for that.
>
With properly specified calling convention, it's all much simpler than you suggest.
By "properly specified calling convention" I mean calling convention that always expects a default rounding on enter/exit of normal __cdecl functions. With calling convention like this the need for control flags reads of any sort is rare, which enable simple and slow implementation.
According to my understanding, Microsoft calling conventions for x87 FPU control word register are exactly of that type. See http://msdn.microsoft.com/en-us/library/ms235300.aspx.
The funny (and, probably, sad) thing is that Microsoft calling conventions for SSE FPU control-and-status word are also like that (http://msdn.microsoft.com/en-us/library/yxty7t75.aspx), which effectively means that Microsoft is not serious about detecting exceptional numeric conditions.
> Obviously for context switching we need ops to read/write FPCR and FPCR', but I don't care if they're privileged.
> Just make them the usual sort of "mv special purpose register N to register M" sort of deal.