By: dmcq (dmcq.delete@this.fano.co.uk), October 31, 2015 4:12 pm
Room: Moderated Discussions
bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 10:23 am wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 8:19 am wrote:
> > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 7:28 am wrote:
> > > dmcq (dmcq.delete@this.fano.co.uk) on October 30, 2015 5:12 am wrote:
> > > > lurker (lurker9000.delete@this.realemail.mail) on October 30, 2015 2:39 am wrote:
> > > > > > First of all - welcome to RWT, glad to hear your perspective.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > > My guess is that everything you say is true...
> > > > >
> > > > > Eh, I just thought I'd post what I heard from a guy who supposedly worked on Zen.
> > > > >
> > > > > > and that AMD isn't intending to hit the HPC
> > > > > > market. They have 128b vectors (since that's all ARM supports), which simply isn't wide
> > > > > > enough to be competitive with Skylake. So giving up on a third AGU makes sense. The third
> > > > > > AGU is probably most helpful for HPC (where they cannot compete anyway) and isn't a particularly
> > > > > > small unit in terms of design complexity and impact on the load/store buffer.
> > > > > >
> > > > > > David
> > > > >
> > > > > 128bit FP pipes seem optimal for most desktop and server software. HPC is pretty
> > > > > much the only place where latest instructions are used and even if zen was competitive
> > > > > here I don't think anyone would want to switch from Intel.
> > > > > Personally I just hope the lack of 3rd AGU won't cause problems in SMT. I don't think
> > > > > normal workloads have that many operations that access memory, but SMT aims to maximize
> > > > > utilization of all available resources and only 2 AGUs might be a problem there.
> > > >
> > > > I think 256 bits would be better as you can do four double precision operations at once and that is quite
> > > > common. On the other hand with four SIMD units instead one could merge two operations to give an effective
> > > > two by 256 bit units except for some special operations. For anything larger they'd probably be better
> > > > off relying on GPUs I think if they can get the coherence and message passing working well. I can see
> > > > how to save larger register sets without impacting interrupt handling too badly but it seems a lot of
> > > > work when ARM is probably hoping to move 64 bit ARM into the embedded processor market.
> > >
> > > Except nobody sane would work on large amounts of
> > > doubles for most normal applications. It's really
> > > only useful for HPC, and there some other things
> > > probably matter even more, as floating point can be
> > > the wrong hammer.
> >
> > Well I know games often just use floats in the GPUs an some AI people say 8-bit integers are enough
> > for any useful AI problem - but it is amazing how fast a sequence of float operations can start to give
> > obviously wrong results. If one wants half a chance of something approximating a reasonable result and
> > aren't an expert at error analysis there's nothing to beat just doing the work using doubles.
>
> While it sounds worthwile, it is actually wrong.
> More bits don't save you from errors; in some cases
> they will give you worse results.
>
> Floats is one of these topics where you can't go with
> (pretty naive) hunches. You need to understand the
> material properly.
I never said it would guarantee you were okay. I said if you want half a chance of something approximating a reasonable result. Of course it can all go wrong even with simple things like calculating the roots of a quadratic but with double you are far more likely to get something workable. Not everyone is an expert at error analysis but an engineer can at least check that results seem to be okay.
The best one can hope for is that errors grow with the square root of the number of operations contributing to a result. Supposing one can do 2^30Gflops and that contributes to a single result then in one second one loses 15 bits of precision - so at best a floating point result will only be accurate to 9 bits. And that really is being quite optimistic. At least doubles aren't practically guaranteed to give such inaccurate results.
> dmcq (dmcq.delete@this.fano.co.uk) on October 31, 2015 8:19 am wrote:
> > bakaneko (nyan.delete@this.hyan.wan) on October 31, 2015 7:28 am wrote:
> > > dmcq (dmcq.delete@this.fano.co.uk) on October 30, 2015 5:12 am wrote:
> > > > lurker (lurker9000.delete@this.realemail.mail) on October 30, 2015 2:39 am wrote:
> > > > > > First of all - welcome to RWT, glad to hear your perspective.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > > My guess is that everything you say is true...
> > > > >
> > > > > Eh, I just thought I'd post what I heard from a guy who supposedly worked on Zen.
> > > > >
> > > > > > and that AMD isn't intending to hit the HPC
> > > > > > market. They have 128b vectors (since that's all ARM supports), which simply isn't wide
> > > > > > enough to be competitive with Skylake. So giving up on a third AGU makes sense. The third
> > > > > > AGU is probably most helpful for HPC (where they cannot compete anyway) and isn't a particularly
> > > > > > small unit in terms of design complexity and impact on the load/store buffer.
> > > > > >
> > > > > > David
> > > > >
> > > > > 128bit FP pipes seem optimal for most desktop and server software. HPC is pretty
> > > > > much the only place where latest instructions are used and even if zen was competitive
> > > > > here I don't think anyone would want to switch from Intel.
> > > > > Personally I just hope the lack of 3rd AGU won't cause problems in SMT. I don't think
> > > > > normal workloads have that many operations that access memory, but SMT aims to maximize
> > > > > utilization of all available resources and only 2 AGUs might be a problem there.
> > > >
> > > > I think 256 bits would be better as you can do four double precision operations at once and that is quite
> > > > common. On the other hand with four SIMD units instead one could merge two operations to give an effective
> > > > two by 256 bit units except for some special operations. For anything larger they'd probably be better
> > > > off relying on GPUs I think if they can get the coherence and message passing working well. I can see
> > > > how to save larger register sets without impacting interrupt handling too badly but it seems a lot of
> > > > work when ARM is probably hoping to move 64 bit ARM into the embedded processor market.
> > >
> > > Except nobody sane would work on large amounts of
> > > doubles for most normal applications. It's really
> > > only useful for HPC, and there some other things
> > > probably matter even more, as floating point can be
> > > the wrong hammer.
> >
> > Well I know games often just use floats in the GPUs an some AI people say 8-bit integers are enough
> > for any useful AI problem - but it is amazing how fast a sequence of float operations can start to give
> > obviously wrong results. If one wants half a chance of something approximating a reasonable result and
> > aren't an expert at error analysis there's nothing to beat just doing the work using doubles.
>
> While it sounds worthwile, it is actually wrong.
> More bits don't save you from errors; in some cases
> they will give you worse results.
>
> Floats is one of these topics where you can't go with
> (pretty naive) hunches. You need to understand the
> material properly.
I never said it would guarantee you were okay. I said if you want half a chance of something approximating a reasonable result. Of course it can all go wrong even with simple things like calculating the roots of a quadratic but with double you are far more likely to get something workable. Not everyone is an expert at error analysis but an engineer can at least check that results seem to be okay.
The best one can hope for is that errors grow with the square root of the number of operations contributing to a result. Supposing one can do 2^30Gflops and that contributes to a single result then in one second one loses 15 bits of precision - so at best a floating point result will only be accurate to 9 bits. And that really is being quite optimistic. At least doubles aren't practically guaranteed to give such inaccurate results.