Article: AMD's Jaguar Microarchitecture
By: Maynard Handley (name99.delete@this.name99.org), April 8, 2014 5:55 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on April 8, 2014 4:05 pm wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 8, 2014 2:56 pm wrote:
> >
> > Loads and stores are always done in the load-store units,
> > there is no accelerator that executes memory accesses
> > early in the pipeline. Typical use is read control register, clear some bits, set some bits, write control
> > register. If there was an instruction "change rounding mode
> > to round-to-even" then yes you could easily attach
> > that to all FP ops during decode, but that is much harder for generic status register writes.
> >
> > Wilco
> >
>
> On Power and with respect to changing FP rounding mode Maynard's idea could actually work.
> For example, mtfsfi 7,IMM is not exactly equivalent to "change rounding mode
> to imm[1..0]", but, for sake of brevity, sufficiently close to that.
>
You're right. I had in mind the "set to a known state" part of the problem, where I expected the known state to be an immediate stored in the set-FP-control instruction.
The "restore state on module exit" part of the problem is trickier because the way that is handled today does require reading from a general purpose (usually FP) register, with all that implies in terms of having to wait until the register is available, etc. What's frustrating is that the ACTUAL use case, the problem we are trying to solve, does not require any delay; it's just modeled in a really bad way.
What should be provided is something like:
the two high performance ops are
- swap control state to this IMMEDIATE value
- restore previous control state
along with one additional register (FPCR').
Swap state moves FPCR to FPCR' and IMM to FPCR (as I said, at rename time)
Restore state moves FPCR' back to FPCR
Apart from these two ops, FPCR and FPCR' are invisible to the ISA.
The idea here is that for all NORMAL uses of FP control, dicking around with "I'll take whatever randomness the previous state was, just as long as I get the rounding mode I want" or whatever is nonsense, hence this model of read flags, bit twiddle in a new flag or two, write back, is nonsense.
What you normally want is the two operations I provide --- set known state to what I need, and restore state to whatever the guy before me was using. I've no interest in WHAT that previous state was, or how it relates to the state I want, so don't bother with ops that allow for that.
Obviously for context switching we need ops to read/write FPCR and FPCR', but I don't care if they're privileged. Just make them the usual sort of "mv special purpose register N to register M" sort of deal.
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 8, 2014 2:56 pm wrote:
> >
> > Loads and stores are always done in the load-store units,
> > there is no accelerator that executes memory accesses
> > early in the pipeline. Typical use is read control register, clear some bits, set some bits, write control
> > register. If there was an instruction "change rounding mode
> > to round-to-even" then yes you could easily attach
> > that to all FP ops during decode, but that is much harder for generic status register writes.
> >
> > Wilco
> >
>
> On Power and with respect to changing FP rounding mode Maynard's idea could actually work.
> For example, mtfsfi 7,IMM is not exactly equivalent to "change rounding mode
> to imm[1..0]", but, for sake of brevity, sufficiently close to that.
>
You're right. I had in mind the "set to a known state" part of the problem, where I expected the known state to be an immediate stored in the set-FP-control instruction.
The "restore state on module exit" part of the problem is trickier because the way that is handled today does require reading from a general purpose (usually FP) register, with all that implies in terms of having to wait until the register is available, etc. What's frustrating is that the ACTUAL use case, the problem we are trying to solve, does not require any delay; it's just modeled in a really bad way.
What should be provided is something like:
the two high performance ops are
- swap control state to this IMMEDIATE value
- restore previous control state
along with one additional register (FPCR').
Swap state moves FPCR to FPCR' and IMM to FPCR (as I said, at rename time)
Restore state moves FPCR' back to FPCR
Apart from these two ops, FPCR and FPCR' are invisible to the ISA.
The idea here is that for all NORMAL uses of FP control, dicking around with "I'll take whatever randomness the previous state was, just as long as I get the rounding mode I want" or whatever is nonsense, hence this model of read flags, bit twiddle in a new flag or two, write back, is nonsense.
What you normally want is the two operations I provide --- set known state to what I need, and restore state to whatever the guy before me was using. I've no interest in WHAT that previous state was, or how it relates to the state I want, so don't bother with ops that allow for that.
Obviously for context switching we need ops to read/write FPCR and FPCR', but I don't care if they're privileged. Just make them the usual sort of "mv special purpose register N to register M" sort of deal.