Article: AMD's Jaguar Microarchitecture
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), April 7, 2014 12:50 pm
Room: Moderated Discussions
UnmaskedUnderflow (unmasked.delete@this.unmasked.org) on April 7, 2014 12:26 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on April 7, 2014 5:39 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 7, 2014 3:27 am wrote:
> > > >
> > > > The real problem is more exotic - x87 precision control can reduce the precision of
> > > > mantissa, but it can't reduce the range of exponent. So, results of x87 computations
> > > > with single or double precision remain the same as specified by IEEE only as long as
> > > > you stay within official range. Which sounds nearly impossible in practice.
> > >
> > > You can store to memory after every operation to get the exponent right - this is still
> > > not IEEE compliant as denormals suffer from double rounding. Of course doing this causes
> > > another performance penalty but at least it gives more consistent results than variables
> > > whose values suddenly change due to needing to be spilled to memory by the compiler.
> > >
> > >
> > > Basically you cannot get IEEE results from x87. Quite ironic since
> > > x87 was supposed to be the first IEEE implementation...
> > >
> >
>
> This part of the conversation is confusing details. x87 precision will correctly round
> once if you set it to DP/SP and produce the correct memory result internally.
If only - my point was that it doesn't produce correct SP/DP results. See eg. http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf for a good writeup of the many pitfalls of x87.
> > > > > The Motorola 68881/2 did not have that problem, IIRC.
> > > > > (The whole x87 instruction set is a joke anyway)
> > >
> > > And ARM's FPA did get it right too. The broken stack implementation is another idiotic aspect of x87 indeed.
> > >
> > > Wilco
> > >
>
> 1.) The x87 stack has been renamed for as long as renaming has been around. Hardware wise it
> just fronts the rules of a stack. If you must throw rocks, throw rocks at 8 arch registers.
Nobody is talking about whether it complicates OoO hardware. It complicates software. If you push 4 items on the stack and then call a function which pushes another 5, it just corrupts the stack without any warning. That means you can't use the stack like, well, a stack. It doesn't work like a register file either, requiring fxchg (slow on older cores!) to get the values you need at the top. So you end up only using it for simple subexpressions and do almost everything through memory.
> 2.) ARM is masked. x87/x86 must legacy support unmasked. I can't emphasize what that means enough. (Maybe I
> should put it in my name.) This means keeping the unrounded infinite precise result around. The implementations
> are not equivalent...but certainly in the realm of "who gives a crap" for 99.99999% of the community.
I'm not sure what you mean, what is your point? You're right, most people do not know or care about these issues (I've had the misfortune of running an IEEE test on a simulator that used the x87 FPU...), however the many problems meant the world quickly adopted SSE.
Wilco
> Michael S (already5chosen.delete@this.yahoo.com) on April 7, 2014 5:39 am wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 7, 2014 3:27 am wrote:
> > > >
> > > > The real problem is more exotic - x87 precision control can reduce the precision of
> > > > mantissa, but it can't reduce the range of exponent. So, results of x87 computations
> > > > with single or double precision remain the same as specified by IEEE only as long as
> > > > you stay within official range. Which sounds nearly impossible in practice.
> > >
> > > You can store to memory after every operation to get the exponent right - this is still
> > > not IEEE compliant as denormals suffer from double rounding. Of course doing this causes
> > > another performance penalty but at least it gives more consistent results than variables
> > > whose values suddenly change due to needing to be spilled to memory by the compiler.
> > >
> > >
> > > Basically you cannot get IEEE results from x87. Quite ironic since
> > > x87 was supposed to be the first IEEE implementation...
> > >
> >
>
> This part of the conversation is confusing details. x87 precision will correctly round
> once if you set it to DP/SP and produce the correct memory result internally.
If only - my point was that it doesn't produce correct SP/DP results. See eg. http://hal.archives-ouvertes.fr/docs/00/28/14/29/PDF/floating-point-article.pdf for a good writeup of the many pitfalls of x87.
> > > > > The Motorola 68881/2 did not have that problem, IIRC.
> > > > > (The whole x87 instruction set is a joke anyway)
> > >
> > > And ARM's FPA did get it right too. The broken stack implementation is another idiotic aspect of x87 indeed.
> > >
> > > Wilco
> > >
>
> 1.) The x87 stack has been renamed for as long as renaming has been around. Hardware wise it
> just fronts the rules of a stack. If you must throw rocks, throw rocks at 8 arch registers.
Nobody is talking about whether it complicates OoO hardware. It complicates software. If you push 4 items on the stack and then call a function which pushes another 5, it just corrupts the stack without any warning. That means you can't use the stack like, well, a stack. It doesn't work like a register file either, requiring fxchg (slow on older cores!) to get the values you need at the top. So you end up only using it for simple subexpressions and do almost everything through memory.
> 2.) ARM is masked. x87/x86 must legacy support unmasked. I can't emphasize what that means enough. (Maybe I
> should put it in my name.) This means keeping the unrounded infinite precise result around. The implementations
> are not equivalent...but certainly in the realm of "who gives a crap" for 99.99999% of the community.
I'm not sure what you mean, what is your point? You're right, most people do not know or care about these issues (I've had the misfortune of running an IEEE test on a simulator that used the x87 FPU...), however the many problems meant the world quickly adopted SSE.
Wilco