Article: AMD's Jaguar Microarchitecture
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), April 6, 2014 3:51 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on April 5, 2014 6:01 pm wrote:
> The point is
> (a) put it ALL CONTROL state in ONE register. And don't make the problem artificially
> more complicated by mixing up CONTROL bits with STATUS bits.
> (b) allow rapid reading and writing of that control state register with a single user level
> read/write operation, instead of a twiddle one bit at a time with high latency model.
Separating control and status bits like ARM64 does is not really any better at all. For most of the fenv implementation you end up doing twice as many FP status reads and writes... What you actually need is hardware checking old vs new bits and only serializing FP ops or flushing the pipeline when absolutely required.
This is a real issue on all architectures as the latest GLIBC uses fenv to set and restore rounding mode on every math call in quite an inefficient way. Performance of math functions more than halves in most cases.
Wilco
> The point is
> (a) put it ALL CONTROL state in ONE register. And don't make the problem artificially
> more complicated by mixing up CONTROL bits with STATUS bits.
> (b) allow rapid reading and writing of that control state register with a single user level
> read/write operation, instead of a twiddle one bit at a time with high latency model.
Separating control and status bits like ARM64 does is not really any better at all. For most of the fenv implementation you end up doing twice as many FP status reads and writes... What you actually need is hardware checking old vs new bits and only serializing FP ops or flushing the pipeline when absolutely required.
This is a real issue on all architectures as the latest GLIBC uses fenv to set and restore rounding mode on every math call in quite an inefficient way. Performance of math functions more than halves in most cases.
Wilco