Article: AMD's Jaguar Microarchitecture
By: Michael S (already5chosen.delete@this.yahoo.com), April 6, 2014 12:22 am
Room: Moderated Discussions
computational_scientist (brian.bj.parker99.delete@this.gmail.com) on April 5, 2014 6:50 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2014 4:56 pm wrote:
> > TREZA (no.delete@this.ema.il) on April 4, 2014 3:18 pm wrote:
> > >
> > > Denormals is what made IEEE FP superior to VAX. It is the correct way of doing FP math!
> >
> > Don't get me wrong - I actually personally like denormals, and think they are
> > definitely required for serious FP math, but the thing is, 99% of FP use isn't
> > really serious, and most people don't really understand FP anyway.
> >
> > The kind of people who really understand FP and do serious math using it (to the point of
> > caring about the order of operations, never mind denormals), those kinds of people really
> > do know what they are doing, and sometimes (although not always) really do want denormals.
> >
> > But there's the other side of FP math, which really just wants good enough values quickly. There are a lot
> > of people who are ok with single-precision and no denormals. And yes, quite often 24 bits of precision is
> > not enough, and they decide they actually need double precision in order to avoid odd visual artifacts.
> >
> > Yeah, I'm talking about things like games.
> >
> > And the thing is, the defaults tend to be the wrong way around. People who don't know what they
> > are doing with floating point basically *never* need denormals. You will generally hit other issues
> > long before you hit the "oops, I lost precision because I didn't have denormals". But exactly
> > *because* they don't know about floating point, they also don't know to disable them.
> >
> > So I think that from a hardware designer standpoint, it actually
> > would make more sense if the default was "denormals
> > are zero" (or "flush to zero"), because the people who do want denormals also know about them.
> > So while
> > I like denormals, the onus on disabling denormals when not needed is kind of the wrong way around.
>
> This is back to front reasoning: physicists, mathematicians and most computational scientists don't
> know anything about the details of floating point and use the defaults like everyone else.
> Even simple assumptions like x==y iff x-y==0 don't always hold in FTZ mode; default denormal mode
> is the simplest floating point model mathematically. By contrast, people running gaming benchmarks
> who know about these things can easily enable FTZ to speed them up for marketing purposes.
>
> http://www.cs.berkeley.edu/~wkahan/19July10.pdf
> has some interesting history and rationale for denormals.
> (There are several readable summaries of the rationale for other IEEE754
> features on Kahan's web site that are well worth reading).
>
> >
> > But it's too late, and big CPU's largely do already handle them right
> > and don't care, so it's mostly a problem for the embedded space.
>
> Using the denormal benchmark at http://charm.cs.uiuc.edu/subnormal/ , I see an (acceptable) 8x slowdown
> of denormals on my 2.3 GHz Intel Core i7 macbook pro in SSE mode and an (unacceptable) 53x slowdown
> in x87 mode (which is particularly egregious as x87 mode is needed for precise numerical work).
x87 is only needed for extended precision. I'd expect that in practice the speed of handling of denormals could become problematic in single-precision calculations, much less so in double or extended precision. Of course, could be wrong about it.
> FMAC instructions make fast denormal processing easy to implement,
> which is probably why Jaguar's denormal handling is slow.
For some definition of fast you could be right. But, I think, when people here are talking about fast denormals they mean "full-speed". For that I don't see how FMAC could make a difference.
>
> It is a shame that as the importance of computational methods to society and the need for accurate
> and reliable floating point has increased in *absolute* terms over the last decades, the *relative*
> decreased usage compared with multimedia applications has lead to floating point hardware capabilities
> being degraded to meet gamers' needs. Processor speed, memory size, even screen resolution, have
> all increased monotonically over the last decades- floating point precision, range and reliability
> is the only feature that has actually decreased... very unfortunate.
>
I think, if we are looking at span of 30+ years then today's state of things does not look quite that bad.
Back then scientists that did heavy numeric used rather bad machines, like Cray and CDC Cyber. Better machines were available, but not suitable for "serious" numerics, either due to low absolute performance (Intel x87, Motorola 68881) or small addressing range (x87) or poor price/performance combined with numeric problems of their own although, probably, not as bad as Cray and CDC (IBM S/370). And one couldn't even apply flock-of-chicken strategy, because micros with good FP were not build for SMP nor for efficient clustering.
Of course, there already were VAXen, but they were neither particularly fast nor particularly cheap. And had no denormal support at all.
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2014 4:56 pm wrote:
> > TREZA (no.delete@this.ema.il) on April 4, 2014 3:18 pm wrote:
> > >
> > > Denormals is what made IEEE FP superior to VAX. It is the correct way of doing FP math!
> >
> > Don't get me wrong - I actually personally like denormals, and think they are
> > definitely required for serious FP math, but the thing is, 99% of FP use isn't
> > really serious, and most people don't really understand FP anyway.
> >
> > The kind of people who really understand FP and do serious math using it (to the point of
> > caring about the order of operations, never mind denormals), those kinds of people really
> > do know what they are doing, and sometimes (although not always) really do want denormals.
> >
> > But there's the other side of FP math, which really just wants good enough values quickly. There are a lot
> > of people who are ok with single-precision and no denormals. And yes, quite often 24 bits of precision is
> > not enough, and they decide they actually need double precision in order to avoid odd visual artifacts.
> >
> > Yeah, I'm talking about things like games.
> >
> > And the thing is, the defaults tend to be the wrong way around. People who don't know what they
> > are doing with floating point basically *never* need denormals. You will generally hit other issues
> > long before you hit the "oops, I lost precision because I didn't have denormals". But exactly
> > *because* they don't know about floating point, they also don't know to disable them.
> >
> > So I think that from a hardware designer standpoint, it actually
> > would make more sense if the default was "denormals
> > are zero" (or "flush to zero"), because the people who do want denormals also know about them.
> > So while
> > I like denormals, the onus on disabling denormals when not needed is kind of the wrong way around.
>
> This is back to front reasoning: physicists, mathematicians and most computational scientists don't
> know anything about the details of floating point and use the defaults like everyone else.
> Even simple assumptions like x==y iff x-y==0 don't always hold in FTZ mode; default denormal mode
> is the simplest floating point model mathematically. By contrast, people running gaming benchmarks
> who know about these things can easily enable FTZ to speed them up for marketing purposes.
>
> http://www.cs.berkeley.edu/~wkahan/19July10.pdf
> has some interesting history and rationale for denormals.
> (There are several readable summaries of the rationale for other IEEE754
> features on Kahan's web site that are well worth reading).
>
> >
> > But it's too late, and big CPU's largely do already handle them right
> > and don't care, so it's mostly a problem for the embedded space.
>
> Using the denormal benchmark at http://charm.cs.uiuc.edu/subnormal/ , I see an (acceptable) 8x slowdown
> of denormals on my 2.3 GHz Intel Core i7 macbook pro in SSE mode and an (unacceptable) 53x slowdown
> in x87 mode (which is particularly egregious as x87 mode is needed for precise numerical work).
x87 is only needed for extended precision. I'd expect that in practice the speed of handling of denormals could become problematic in single-precision calculations, much less so in double or extended precision. Of course, could be wrong about it.
> FMAC instructions make fast denormal processing easy to implement,
> which is probably why Jaguar's denormal handling is slow.
For some definition of fast you could be right. But, I think, when people here are talking about fast denormals they mean "full-speed". For that I don't see how FMAC could make a difference.
>
> It is a shame that as the importance of computational methods to society and the need for accurate
> and reliable floating point has increased in *absolute* terms over the last decades, the *relative*
> decreased usage compared with multimedia applications has lead to floating point hardware capabilities
> being degraded to meet gamers' needs. Processor speed, memory size, even screen resolution, have
> all increased monotonically over the last decades- floating point precision, range and reliability
> is the only feature that has actually decreased... very unfortunate.
>
I think, if we are looking at span of 30+ years then today's state of things does not look quite that bad.
Back then scientists that did heavy numeric used rather bad machines, like Cray and CDC Cyber. Better machines were available, but not suitable for "serious" numerics, either due to low absolute performance (Intel x87, Motorola 68881) or small addressing range (x87) or poor price/performance combined with numeric problems of their own although, probably, not as bad as Cray and CDC (IBM S/370). And one couldn't even apply flock-of-chicken strategy, because micros with good FP were not build for SMP nor for efficient clustering.
Of course, there already were VAXen, but they were neither particularly fast nor particularly cheap. And had no denormal support at all.