By: dmcq (dmcq.delete@this.fano.co.uk), November 1, 2015 7:36 am
Room: Moderated Discussions
bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 5:25 pm wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 4:12 pm wrote:
> > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 10:23 am wrote:
> > > dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 8:19 am wrote:
> > > > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 7:28 am wrote:
> > > > > dmcq (dmcq.delete@this.fano.co.uk) on October 30, 2015 5:12 am wrote:
> > > > > > lurker (lurker9000.delete@this.realemail.mail) on October 30, 2015 2:39 am wrote:
> > > > > > > > First of all - welcome to RWT, glad to hear your perspective.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > > My guess is that everything you say is true...
> > > > > > >
> > > > > > > Eh, I just thought I'd post what I heard from a guy who supposedly worked on Zen.
> > > > > > >
> > > > > > > > and that AMD isn't intending to hit the HPC
> > > > > > > > market. They have 128b vectors (since that's all ARM supports), which simply isn't wide
> > > > > > > > enough to be competitive with Skylake. So giving up on a third AGU makes sense. The third
> > > > > > > > AGU is probably most helpful for HPC (where they cannot compete anyway) and isn't a particularly
> > > > > > > > small unit in terms of design complexity and impact on the load/store buffer.
> > > > > > > >
> > > > > > > > David
> > > > > > >
> > > > > > > 128bit FP pipes seem optimal for most desktop and server software. HPC is pretty
> > > > > > > much the only place where latest instructions are used and even if zen was competitive
> > > > > > > here I don't think anyone would want to switch from Intel.
> > > > > > > Personally I just hope the lack of 3rd AGU won't cause problems in SMT. I don't think
> > > > > > > normal workloads have that many operations that access memory, but SMT aims to maximize
> > > > > > > utilization of all available resources and only 2 AGUs might be a problem there.
> > > > > >
> > > > > > I think 256 bits would be better as you can do four double precision operations at once and that is quite
> > > > > > common. On the other hand with four SIMD units instead one could merge two operations to give an effective
> > > > > > two by 256 bit units except for some special operations. For anything larger they'd probably be better
> > > > > > off relying on GPUs I think if they can get the coherence and message passing working well. I can see
> > > > > > how to save larger register sets without impacting interrupt handling too badly but it seems a lot of
> > > > > > work when ARM is probably hoping to move 64 bit ARM into the embedded processor market.
> > > > >
> > > > > Except nobody sane would work on large amounts of
> > > > > doubles for most normal applications. It's really
> > > > > only useful for HPC, and there some other things
> > > > > probably matter even more, as floating point can be
> > > > > the wrong hammer.
> > > >
> > > > Well I know games often just use floats in the GPUs an some AI people say 8-bit integers are enough
> > > > for any useful AI problem - but it is amazing how fast a sequence of float operations can start to give
> > > > obviously wrong results. If one wants half a chance of something approximating a reasonable result and
> > > > aren't an expert at error analysis there's nothing to beat just doing the work using doubles.
> > >
> > > While it sounds worthwile, it is actually wrong.
> > > More bits don't save you from errors; in some cases
> > > they will give you worse results.
> > >
> > > Floats is one of these topics where you can't go with
> > > (pretty naive) hunches. You need to understand the
> > > material properly.
> >
> > I never said it would guarantee you were okay. I said if you want half a chance of something approximating
> > a reasonable result. Of course it can all go wrong even with simple things like calculating the roots
> > of a quadratic but with double you are far more likely to get something workable. Not everyone is an
> > expert at error analysis but an engineer can at least check that results seem to be okay.
> >
> > The best one can hope for is that errors grow with the square root of the number of operations contributing
> > to a result. Supposing one can do 2^30Gflops and that contributes to a single result then in one second one
> > loses 15 bits of precision - so at best a floating point
> > result will only be accurate to 9 bits. And that really
> > is being quite optimistic. At least doubles aren't practically guaranteed to give such inaccurate results.
>
> What a pile off bullshit.
>
> You said AMD totally needs 256bit SIMD units for high
> double performance.
> Your reason given? None.
>
> Your reasons why doubles matter so much outside HPC: A
> few engineers who hack together code without much clue
> could need it.
>
> And your example about adding billions of numbers sounds
> like it misses the point, because it isn't developed
> software, just a naive guess with probably not much thought
> about the details.
I didn't say AMD needs 256bit SIMD units and I didn't say they didn't need them. You must be confusing me with someone else.
I didn't talk about adding, I just talked about operations. Multiplication is the type of operation I had in mind but adding when one has a small fixed range tends to do the same sort of thing eventually too. One wouldn't normally be doing millions of a single operation like add in a row but if you're interested there's Kahan summation algorithm can do that cleanly without needing to sort the numbers and there others too which are faster but a little less accurate.
What I said is exactly what will happen if for instance you keep applying an orthogonal matrix over and over again and is why people try to recalculate matrices in graphics from scratch rather than continually applying small changes. And staying on the 3D theme 4x4 matrices are the most common way of describing movements in 3D and quaternions are also used for describing rotations without singular points which makes doing 4 floating point operations at once a very common and easy to compile operation.
Engineers like to just get results and are willing to blow an extra couple of hours computer time if it saves them think time. They do have a clue but nobody is an expert at everything. There's no need to talk of them as not having a clue because they use double instead of float!
> dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 4:12 pm wrote:
> > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 10:23 am wrote:
> > > dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 8:19 am wrote:
> > > > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 7:28 am wrote:
> > > > > dmcq (dmcq.delete@this.fano.co.uk) on October 30, 2015 5:12 am wrote:
> > > > > > lurker (lurker9000.delete@this.realemail.mail) on October 30, 2015 2:39 am wrote:
> > > > > > > > First of all - welcome to RWT, glad to hear your perspective.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > > My guess is that everything you say is true...
> > > > > > >
> > > > > > > Eh, I just thought I'd post what I heard from a guy who supposedly worked on Zen.
> > > > > > >
> > > > > > > > and that AMD isn't intending to hit the HPC
> > > > > > > > market. They have 128b vectors (since that's all ARM supports), which simply isn't wide
> > > > > > > > enough to be competitive with Skylake. So giving up on a third AGU makes sense. The third
> > > > > > > > AGU is probably most helpful for HPC (where they cannot compete anyway) and isn't a particularly
> > > > > > > > small unit in terms of design complexity and impact on the load/store buffer.
> > > > > > > >
> > > > > > > > David
> > > > > > >
> > > > > > > 128bit FP pipes seem optimal for most desktop and server software. HPC is pretty
> > > > > > > much the only place where latest instructions are used and even if zen was competitive
> > > > > > > here I don't think anyone would want to switch from Intel.
> > > > > > > Personally I just hope the lack of 3rd AGU won't cause problems in SMT. I don't think
> > > > > > > normal workloads have that many operations that access memory, but SMT aims to maximize
> > > > > > > utilization of all available resources and only 2 AGUs might be a problem there.
> > > > > >
> > > > > > I think 256 bits would be better as you can do four double precision operations at once and that is quite
> > > > > > common. On the other hand with four SIMD units instead one could merge two operations to give an effective
> > > > > > two by 256 bit units except for some special operations. For anything larger they'd probably be better
> > > > > > off relying on GPUs I think if they can get the coherence and message passing working well. I can see
> > > > > > how to save larger register sets without impacting interrupt handling too badly but it seems a lot of
> > > > > > work when ARM is probably hoping to move 64 bit ARM into the embedded processor market.
> > > > >
> > > > > Except nobody sane would work on large amounts of
> > > > > doubles for most normal applications. It's really
> > > > > only useful for HPC, and there some other things
> > > > > probably matter even more, as floating point can be
> > > > > the wrong hammer.
> > > >
> > > > Well I know games often just use floats in the GPUs an some AI people say 8-bit integers are enough
> > > > for any useful AI problem - but it is amazing how fast a sequence of float operations can start to give
> > > > obviously wrong results. If one wants half a chance of something approximating a reasonable result and
> > > > aren't an expert at error analysis there's nothing to beat just doing the work using doubles.
> > >
> > > While it sounds worthwile, it is actually wrong.
> > > More bits don't save you from errors; in some cases
> > > they will give you worse results.
> > >
> > > Floats is one of these topics where you can't go with
> > > (pretty naive) hunches. You need to understand the
> > > material properly.
> >
> > I never said it would guarantee you were okay. I said if you want half a chance of something approximating
> > a reasonable result. Of course it can all go wrong even with simple things like calculating the roots
> > of a quadratic but with double you are far more likely to get something workable. Not everyone is an
> > expert at error analysis but an engineer can at least check that results seem to be okay.
> >
> > The best one can hope for is that errors grow with the square root of the number of operations contributing
> > to a result. Supposing one can do 2^30Gflops and that contributes to a single result then in one second one
> > loses 15 bits of precision - so at best a floating point
> > result will only be accurate to 9 bits. And that really
> > is being quite optimistic. At least doubles aren't practically guaranteed to give such inaccurate results.
>
> What a pile off bullshit.
>
> You said AMD totally needs 256bit SIMD units for high
> double performance.
> Your reason given? None.
>
> Your reasons why doubles matter so much outside HPC: A
> few engineers who hack together code without much clue
> could need it.
>
> And your example about adding billions of numbers sounds
> like it misses the point, because it isn't developed
> software, just a naive guess with probably not much thought
> about the details.
I didn't say AMD needs 256bit SIMD units and I didn't say they didn't need them. You must be confusing me with someone else.
I didn't talk about adding, I just talked about operations. Multiplication is the type of operation I had in mind but adding when one has a small fixed range tends to do the same sort of thing eventually too. One wouldn't normally be doing millions of a single operation like add in a row but if you're interested there's Kahan summation algorithm can do that cleanly without needing to sort the numbers and there others too which are faster but a little less accurate.
What I said is exactly what will happen if for instance you keep applying an orthogonal matrix over and over again and is why people try to recalculate matrices in graphics from scratch rather than continually applying small changes. And staying on the 3D theme 4x4 matrices are the most common way of describing movements in 3D and quaternions are also used for describing rotations without singular points which makes doing 4 floating point operations at once a very common and easy to compile operation.
Engineers like to just get results and are willing to blow an extra couple of hours computer time if it saves them think time. They do have a clue but nobody is an expert at everything. There's no need to talk of them as not having a clue because they use double instead of float!