Article: AMD's Jaguar Microarchitecture
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), April 6, 2014 1:48 pm
Room: Moderated Discussions
Megol (golem960.delete@this.gmail.com) on April 6, 2014 8:21 am wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 6, 2014 5:29 am wrote:
> > computational_scientist (brian.bj.parker99.delete@this.gmail.com) on April 5, 2014 6:50 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2014 4:56 pm wrote:
> > > > TREZA (no.delete@this.ema.il) on April 4, 2014 3:18 pm wrote:
> > > > >
> > > > > Denormals is what made IEEE FP superior to VAX. It is the correct way of doing FP math!
> > > >
> > > > Don't get me wrong - I actually personally like denormals, and think they are
> > > > definitely required for serious FP math, but the thing is, 99% of FP use isn't
> > > > really serious, and most people don't really understand FP anyway.
> > > >
> > > > The kind of people who really understand FP and do serious math using it (to the point of
> > > > caring about the order of operations, never mind denormals), those kinds of people really
> > > > do know what they are doing, and sometimes (although not always) really do want denormals.
> > > >
> > > > But there's the other side of FP math, which really just wants good enough values quickly. There are a lot
> > > > of people who are ok with single-precision and no denormals. And yes, quite often 24 bits of precision is
> > > > not enough, and they decide they actually need double precision in order to avoid odd visual artifacts.
> > > >
> > > > Yeah, I'm talking about things like games.
> > > >
> > > > And the thing is, the defaults tend to be the wrong way around. People who don't know what they
> > > > are doing with floating point basically *never* need denormals. You will generally hit other issues
> > > > long before you hit the "oops, I lost precision because I didn't have denormals". But exactly
> > > > *because* they don't know about floating point, they also don't know to disable them.
> > > >
> > > > So I think that from a hardware designer standpoint, it actually
> > > > would make more sense if the default was "denormals
> > > > are zero" (or "flush to zero"), because the people who do want denormals also know about them.
> > > > So while
> > > > I like denormals, the onus on disabling denormals when not needed is kind of the wrong way around.
> > >
> > > This is back to front reasoning: physicists, mathematicians and most computational scientists don't
> > > know anything about the details of floating point and use the defaults like everyone else.
> > > Even simple assumptions like x==y iff x-y==0 don't always hold in FTZ mode; default denormal mode
> > > is the simplest floating point model mathematically. By contrast, people running gaming benchmarks
> > > who know about these things can easily enable FTZ to speed them up for marketing purposes.
> >
> > You can't rely on x==y iff x-y==0 even with denormals. IEEE floating point is extremely complex
> > with lots of odd non-intuitive corner cases. Very few mathematical identities work on IEEE,
> > with or without denormals. Denormals are a major pain in algorithms, for example you need
> > special cases to avoid the catastrophic loss of precision (and the infinities/NaNs caused
> > by calculations with a denormal) as well as the huge slowdown on many CPUs.
> >
> > > http://www.cs.berkeley.edu/~wkahan/19July10.pdf
> > > has some interesting history and rationale for denormals.
> > > (There are several readable summaries of the rationale for other IEEE754
> > > features on Kahan's web site that are well worth reading).
> > >
> > > >
> > > > But it's too late, and big CPU's largely do already handle them right
> > > > and don't care, so it's mostly a problem for the embedded space.
> > >
> > > Using the denormal benchmark at http://charm.cs.uiuc.edu/subnormal/ , I see an (acceptable) 8x slowdown
> > > of denormals on my 2.3 GHz Intel Core i7 macbook pro in SSE mode and an (unacceptable) 53x slowdown
> > > in x87 mode (which is particularly egregious as x87 mode is needed for precise numerical work).
> > > FMAC instructions make fast denormal processing easy to implement,
> > > which is probably why Jaguar's denormal handling is slow.
> >
> > 8x slowdown is still unacceptable in my book - 10% is acceptable. POWER and ARM are at that level.
> >
> > > It is a shame that as the importance of computational methods to society and the need for accurate
> > > and reliable floating point has increased in *absolute* terms over the last decades, the *relative*
> > > decreased usage compared with multimedia applications has lead to floating point hardware capabilities
> > > being degraded to meet gamers' needs. Processor speed, memory size, even screen resolution, have
> > > all increased monotonically over the last decades- floating point precision, range and reliability
> > > is the only feature that has actually decreased... very unfortunate.
> >
> > The fact is the contrary is actually true. The non-IEEE compliant 80-bit x87 is finally dead - a huge
> > step forward for floating point. Programming languages and compilers adopted IEEE and by default provide
> > IEEE compliant optimizations as well user selectable adventurous FP optimizations. Libraries have improved
> > hugely as well, providing far more accurate math functions (the norm is now is becoming
> > available in hardware. So we've made huge progress in the last decades.
>
> Progress? In what way can you construct something like that as progress?
Which part do you not count as progress?
> First: Extended precision floats _are_ IEEE 754 compatible. They are in the standard. From the
> beginning. It have been implemented in 3 architectures at least and most likely several more.
Extended precision as implemented in x87 is a huge mess. It even causes 32 and 64-bit FP to return inconsistent and incorrect results. Nowadays it means 128-bits, not 80.
> Second: It is still supported in the most used performance
> oriented architecture: the x86. So how can it be dead?
Still supported how exactly? Today VC++ does do long double = double. With SSE you finally don't need to use the broken x87 at all and thus you get IEEE compliance.
Wilco
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on April 6, 2014 5:29 am wrote:
> > computational_scientist (brian.bj.parker99.delete@this.gmail.com) on April 5, 2014 6:50 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2014 4:56 pm wrote:
> > > > TREZA (no.delete@this.ema.il) on April 4, 2014 3:18 pm wrote:
> > > > >
> > > > > Denormals is what made IEEE FP superior to VAX. It is the correct way of doing FP math!
> > > >
> > > > Don't get me wrong - I actually personally like denormals, and think they are
> > > > definitely required for serious FP math, but the thing is, 99% of FP use isn't
> > > > really serious, and most people don't really understand FP anyway.
> > > >
> > > > The kind of people who really understand FP and do serious math using it (to the point of
> > > > caring about the order of operations, never mind denormals), those kinds of people really
> > > > do know what they are doing, and sometimes (although not always) really do want denormals.
> > > >
> > > > But there's the other side of FP math, which really just wants good enough values quickly. There are a lot
> > > > of people who are ok with single-precision and no denormals. And yes, quite often 24 bits of precision is
> > > > not enough, and they decide they actually need double precision in order to avoid odd visual artifacts.
> > > >
> > > > Yeah, I'm talking about things like games.
> > > >
> > > > And the thing is, the defaults tend to be the wrong way around. People who don't know what they
> > > > are doing with floating point basically *never* need denormals. You will generally hit other issues
> > > > long before you hit the "oops, I lost precision because I didn't have denormals". But exactly
> > > > *because* they don't know about floating point, they also don't know to disable them.
> > > >
> > > > So I think that from a hardware designer standpoint, it actually
> > > > would make more sense if the default was "denormals
> > > > are zero" (or "flush to zero"), because the people who do want denormals also know about them.
> > > > So while
> > > > I like denormals, the onus on disabling denormals when not needed is kind of the wrong way around.
> > >
> > > This is back to front reasoning: physicists, mathematicians and most computational scientists don't
> > > know anything about the details of floating point and use the defaults like everyone else.
> > > Even simple assumptions like x==y iff x-y==0 don't always hold in FTZ mode; default denormal mode
> > > is the simplest floating point model mathematically. By contrast, people running gaming benchmarks
> > > who know about these things can easily enable FTZ to speed them up for marketing purposes.
> >
> > You can't rely on x==y iff x-y==0 even with denormals. IEEE floating point is extremely complex
> > with lots of odd non-intuitive corner cases. Very few mathematical identities work on IEEE,
> > with or without denormals. Denormals are a major pain in algorithms, for example you need
> > special cases to avoid the catastrophic loss of precision (and the infinities/NaNs caused
> > by calculations with a denormal) as well as the huge slowdown on many CPUs.
> >
> > > http://www.cs.berkeley.edu/~wkahan/19July10.pdf
> > > has some interesting history and rationale for denormals.
> > > (There are several readable summaries of the rationale for other IEEE754
> > > features on Kahan's web site that are well worth reading).
> > >
> > > >
> > > > But it's too late, and big CPU's largely do already handle them right
> > > > and don't care, so it's mostly a problem for the embedded space.
> > >
> > > Using the denormal benchmark at http://charm.cs.uiuc.edu/subnormal/ , I see an (acceptable) 8x slowdown
> > > of denormals on my 2.3 GHz Intel Core i7 macbook pro in SSE mode and an (unacceptable) 53x slowdown
> > > in x87 mode (which is particularly egregious as x87 mode is needed for precise numerical work).
> > > FMAC instructions make fast denormal processing easy to implement,
> > > which is probably why Jaguar's denormal handling is slow.
> >
> > 8x slowdown is still unacceptable in my book - 10% is acceptable. POWER and ARM are at that level.
> >
> > > It is a shame that as the importance of computational methods to society and the need for accurate
> > > and reliable floating point has increased in *absolute* terms over the last decades, the *relative*
> > > decreased usage compared with multimedia applications has lead to floating point hardware capabilities
> > > being degraded to meet gamers' needs. Processor speed, memory size, even screen resolution, have
> > > all increased monotonically over the last decades- floating point precision, range and reliability
> > > is the only feature that has actually decreased... very unfortunate.
> >
> > The fact is the contrary is actually true. The non-IEEE compliant 80-bit x87 is finally dead - a huge
> > step forward for floating point. Programming languages and compilers adopted IEEE and by default provide
> > IEEE compliant optimizations as well user selectable adventurous FP optimizations. Libraries have improved
> > hugely as well, providing far more accurate math functions (the norm is now is becoming
> > available in hardware. So we've made huge progress in the last decades.
>
> Progress? In what way can you construct something like that as progress?
Which part do you not count as progress?
> First: Extended precision floats _are_ IEEE 754 compatible. They are in the standard. From the
> beginning. It have been implemented in 3 architectures at least and most likely several more.
Extended precision as implemented in x87 is a huge mess. It even causes 32 and 64-bit FP to return inconsistent and incorrect results. Nowadays it means 128-bits, not 80.
> Second: It is still supported in the most used performance
> oriented architecture: the x86. So how can it be dead?
Still supported how exactly? Today VC++ does do long double = double. With SSE you finally don't need to use the broken x87 at all and thus you get IEEE compliance.
Wilco