Article: AMD's Jaguar Microarchitecture
By: Michael S (already5chosen.delete@this.yahoo.com), April 6, 2014 1:50 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on April 5, 2014 6:01 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on April 5, 2014 10:59 am wrote:
> >
> > I don't follow.
> > Why should one save/restore the whole FPU state, why not only FP control register? Saving/restoring
> > only FPU control register is not slow relatively to the rest of library call overhead
> > and it *should* be done by any library that wants to use non-default control bits. Anything
> > less is bug and should not be tolerated bu library users.
> >
>
> By FPU state I mean the stuff that would be in an FP control register. I don't
> mean the state of standard FP registers. Sorry, I guess that was not clear.
>
> The point is
> (a) put it ALL CONTROL state in ONE register. And don't make the problem artificially
> more complicated by mixing up CONTROL bits with STATUS bits.
> (b) allow rapid reading and writing of that control state register with a single user level
> read/write operation, instead of a twiddle one bit at a time with high latency model.
Intel/AMD MXCSR, indeed combines control and status in the same registers. Same for Power FPSCR. And, indeed, I don't like it from pure theoretical point of view. But in practice it is not a significant problem.
Other than that, I don't see how MXCSR/FPSCR differ from your description. Also, I don't understand what "twiddle one bit at a time with high latency model" means. Can you give an examples?
> Michael S (already5chosen.delete@this.yahoo.com) on April 5, 2014 10:59 am wrote:
> >
> > I don't follow.
> > Why should one save/restore the whole FPU state, why not only FP control register? Saving/restoring
> > only FPU control register is not slow relatively to the rest of library call overhead
> > and it *should* be done by any library that wants to use non-default control bits. Anything
> > less is bug and should not be tolerated bu library users.
> >
>
> By FPU state I mean the stuff that would be in an FP control register. I don't
> mean the state of standard FP registers. Sorry, I guess that was not clear.
>
> The point is
> (a) put it ALL CONTROL state in ONE register. And don't make the problem artificially
> more complicated by mixing up CONTROL bits with STATUS bits.
> (b) allow rapid reading and writing of that control state register with a single user level
> read/write operation, instead of a twiddle one bit at a time with high latency model.
Intel/AMD MXCSR, indeed combines control and status in the same registers. Same for Power FPSCR. And, indeed, I don't like it from pure theoretical point of view. But in practice it is not a significant problem.
Other than that, I don't see how MXCSR/FPSCR differ from your description. Also, I don't understand what "twiddle one bit at a time with high latency model" means. Can you give an examples?