By: Montaray Jack (none.delete@this.none.org), July 3, 2019 2:32 pm

Room: Moderated Discussions

Wilco (wilco.dijkstra.delete@this.ntlworld.com) on July 2, 2019 5:28 pm wrote:

> Michael S (already5chosen.delete@this.yahoo.com) on July 2, 2019 4:24 am wrote:

> > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on July 2, 2019 1:49 am wrote:

> > > Michael S (already5chosen.delete@this.yahoo.com) on June 30, 2019 5:53 am wrote:

> > > > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on June 30, 2019 5:27 am wrote:

>

> > > > This are not bugs, because they are documented.

> > >

> > > A documented bug remains a bug no matter what you call it... The algorithm used has even changed

> > > at least once so there is not even a guarantee of identical results across x87 implementations.

> > >

> >

> > I am not aware of that. Do you have a reference link?

>

> See eg. this. Note the graph showing ULP error near 1*PI...

>

> > > > Using higher-precision values of pi for range reduction would

> > > > be a bug, because it would break backward compatibility.

> > > > That's different from newer things like rsqrt, where exact values of results

> > > > were never documented or guaranteed in ISA documents, so future implementations

> > > > are free to use different precision as long as it is >= 12 bits.

> > > >

> > > > As to accuracy of range reduction of sin/cos, I personally believe that too precise reduction

> > > > used by some of the new library implementations is stupid and counter-productive. It would

> > > > be much better for big arguments (those where ULP is greater than 2*pi) to return constant

> > > > values, like 0 for sin() and tan() and 1 for cos(). From physical perspective those are

> > > > more correct answers than poor PRNG which is a result of "exact" reduction.

> > > > In that regard, for DP x87 is approximately 1000 times more

> > > > precise than what I consider reasonable, but it, at

> > > > least, has excuse of supporting 80-bit format. On the other hand, pure binary64 libraries have no excuses.

> > >

> > > x87 just uses 68 bits of PI which means there is no working range reduction at all (you need a minimum

> > > of 2 times the mantissa size to cover cancellation). That is a horrific implementation that no one

> > > sane could defend.

> >

> > From perspective of engineer of physicist x87 implementation is overkill.

> > When engineer/physicist asks for sin(x), where x is binary64, he is ready to accept any result

> > in range [exact_sin(x-ULP):exact_sin(x+ULP)], but expects that quality implementation will deliver

> > results in range [exact_sin(x-ULP/2):exact_sin(x+ULP/2)] absolute majority of the time.

> > x87 does MUCH better than that.

>

> I don't think it even manages that. Think of it as 2 sine waves with different periods

> - they will be out of sync very quickly and have completely different values.

>

> > > Once you do acceptable range reduction and support inputs up to 2^53 then you need

> > > ~150 bits of PI. At that point covering the full exponent range is not any harder since it's just

> > > a larger table. So it's not particularly hard to do correct range reduction,

> >

> > Yes, I know. It's very stupid but not particularly hard,

> > because IEEE binary64 has such small exponent range.

> > I find it unfortunate. I'd rather prefer if it was very hard, so nobody would do it in standard 'C' libs.

>

> The exponent of binary128 is also small enough... Yes, if the PI table ended up being megabytes,

> it would be obvious that supporting the full exponent range isn't useful. But since it doesn't actually

> slow down the common case, it's not a problem. On embedded targets codesize matters, so it's common

> (and acceptable) to support accurate range reduction only for a small input range.

>

> > > it's just that most implementations

> > > are incorrect or inefficient. The fastest (and simplest) one is here.

> > >

> > > Wilco

> > >

> >

> > sin/cos/tan of big arguments is garbage. Nobody sane cares if the garbage is somewhat better

> > or somewhat worse. People could, wherever, care for consistency, i.e. even garbage should

> > retain certain properties, like sin(x)^2+cos(x)^2 should be very close to 1.

>

> Sure, but there isn't an official standard for range reduction in IEEE754,

> so we got the defacto standard set by math libraries like fdlibm.

>

> > The only bone, I am ready to throw to the crazies to apiece

> > them a little in standard libs is something like that:

> >

> > n = round(x * (INV_TWO_PI_53bits));

> > x = fma(n, -TWO_PI, x);

> > x = fma(n, -TWO_PI_L, x);

> > where TWO_PI_L = binary64(exact_pi - binary64(exact_pi));

> >

> > That's already an excessive effort, but, as long as FMA is available, it's so cheap

> > that it simpler to give them that much than to insist on strict sanity and on rights

> > of normal user to suffer no slowdown at all because of those nut jobs.

>

> This gives accurate range reduction over a small input range, so it's the minimum you need. Anyway if

> you think that is excessive, you've not seen actual implementations of range reduction. Most implementations

> are far less efficient than this (no FMA, no round() etc). And the full range reducer is a horrible

> mess given they typically use 6 doubles to emulate bit-exact 128-bit

>

> However if you wanted crazy, look no further than correctly rounded math functions! These use

> up to 2048-bit mantissas and as a result performance of common inputs may be a million times

> slower. Eg. exp can take 10s of nanoseconds on one input but take many milliseconds on the next.

> Using that much time just to verify we rounded correctly - now that is batshit insane!

>

> Wilco

Not necessarily insane.

Consider a very high precision traverse measurement, like one where the actual measurement takes over a year to collect the data. Back in the 80's my father did the elevation benchmark for DuPage County here in Illinois. (NAVD88 datum IIRC the data was recorded in 1985 or thereabouts, but the monuments were not all in place until 1988) Instead of a optical level, like one could see on a construction site, he used a specialized transit with very sensitive bubble levels on the optics.

The procedure went something like this: Record the Temperature , sight the transit on the measurement rod, record the angle on the transit, flip the optics 180° and rotate the transit 180°(A reversal to cancel out any systematic error in the instrument), flip the measuring rod (Another reversal to cancel out any systematic error in the measuring rod instrument), repeat previous steps. Swap the locations of the Transit and the Rod And repeat the same 4 measurements to negate any elevation errors in the Transit setup. Onto the next point. So eight independent measurements per point, last point of each day would be a return to the start of the day to close the traverse, the POB, Point of Beginning.

Since Chicago, like most of the Midwest, is laid out on a cardinal aligned Cartesian grid, we would end up with a lot of angles approaching 90° and 270° for the North South streets, inconveniently near the top and bottom of the sine curve, Similarly East-West and cosine. I suppose we could draw some attention the the similarity of Geodesy and Finite Element Modeling, the triangles in the grid matter.

The end result was an error down in the low parts per million, he was quite proud of it. At the time, I believe it was the highest accuracy elevation benchmark ever done. Why throw out all those measurements and metrology tricks to cancel out error in the instruments for rounding errors in the math?

> Michael S (already5chosen.delete@this.yahoo.com) on July 2, 2019 4:24 am wrote:

> > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on July 2, 2019 1:49 am wrote:

> > > Michael S (already5chosen.delete@this.yahoo.com) on June 30, 2019 5:53 am wrote:

> > > > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on June 30, 2019 5:27 am wrote:

>

> > > > This are not bugs, because they are documented.

> > >

> > > A documented bug remains a bug no matter what you call it... The algorithm used has even changed

> > > at least once so there is not even a guarantee of identical results across x87 implementations.

> > >

> >

> > I am not aware of that. Do you have a reference link?

>

> See eg. this. Note the graph showing ULP error near 1*PI...

>

> > > > Using higher-precision values of pi for range reduction would

> > > > be a bug, because it would break backward compatibility.

> > > > That's different from newer things like rsqrt, where exact values of results

> > > > were never documented or guaranteed in ISA documents, so future implementations

> > > > are free to use different precision as long as it is >= 12 bits.

> > > >

> > > > As to accuracy of range reduction of sin/cos, I personally believe that too precise reduction

> > > > used by some of the new library implementations is stupid and counter-productive. It would

> > > > be much better for big arguments (those where ULP is greater than 2*pi) to return constant

> > > > values, like 0 for sin() and tan() and 1 for cos(). From physical perspective those are

> > > > more correct answers than poor PRNG which is a result of "exact" reduction.

> > > > In that regard, for DP x87 is approximately 1000 times more

> > > > precise than what I consider reasonable, but it, at

> > > > least, has excuse of supporting 80-bit format. On the other hand, pure binary64 libraries have no excuses.

> > >

> > > x87 just uses 68 bits of PI which means there is no working range reduction at all (you need a minimum

> > > of 2 times the mantissa size to cover cancellation). That is a horrific implementation that no one

> > > sane could defend.

> >

> > From perspective of engineer of physicist x87 implementation is overkill.

> > When engineer/physicist asks for sin(x), where x is binary64, he is ready to accept any result

> > in range [exact_sin(x-ULP):exact_sin(x+ULP)], but expects that quality implementation will deliver

> > results in range [exact_sin(x-ULP/2):exact_sin(x+ULP/2)] absolute majority of the time.

> > x87 does MUCH better than that.

>

> I don't think it even manages that. Think of it as 2 sine waves with different periods

> - they will be out of sync very quickly and have completely different values.

>

> > > Once you do acceptable range reduction and support inputs up to 2^53 then you need

> > > ~150 bits of PI. At that point covering the full exponent range is not any harder since it's just

> > > a larger table. So it's not particularly hard to do correct range reduction,

> >

> > Yes, I know. It's very stupid but not particularly hard,

> > because IEEE binary64 has such small exponent range.

> > I find it unfortunate. I'd rather prefer if it was very hard, so nobody would do it in standard 'C' libs.

>

> The exponent of binary128 is also small enough... Yes, if the PI table ended up being megabytes,

> it would be obvious that supporting the full exponent range isn't useful. But since it doesn't actually

> slow down the common case, it's not a problem. On embedded targets codesize matters, so it's common

> (and acceptable) to support accurate range reduction only for a small input range.

>

> > > it's just that most implementations

> > > are incorrect or inefficient. The fastest (and simplest) one is here.

> > >

> > > Wilco

> > >

> >

> > sin/cos/tan of big arguments is garbage. Nobody sane cares if the garbage is somewhat better

> > or somewhat worse. People could, wherever, care for consistency, i.e. even garbage should

> > retain certain properties, like sin(x)^2+cos(x)^2 should be very close to 1.

>

> Sure, but there isn't an official standard for range reduction in IEEE754,

> so we got the defacto standard set by math libraries like fdlibm.

>

> > The only bone, I am ready to throw to the crazies to apiece

> > them a little in standard libs is something like that:

> >

> > n = round(x * (INV_TWO_PI_53bits));

> > x = fma(n, -TWO_PI, x);

> > x = fma(n, -TWO_PI_L, x);

> > where TWO_PI_L = binary64(exact_pi - binary64(exact_pi));

> >

> > That's already an excessive effort, but, as long as FMA is available, it's so cheap

> > that it simpler to give them that much than to insist on strict sanity and on rights

> > of normal user to suffer no slowdown at all because of those nut jobs.

>

> This gives accurate range reduction over a small input range, so it's the minimum you need. Anyway if

> you think that is excessive, you've not seen actual implementations of range reduction. Most implementations

> are far less efficient than this (no FMA, no round() etc). And the full range reducer is a horrible

> mess given they typically use 6 doubles to emulate bit-exact 128-bit

*integer*multiplies.>

> However if you wanted crazy, look no further than correctly rounded math functions! These use

> up to 2048-bit mantissas and as a result performance of common inputs may be a million times

> slower. Eg. exp can take 10s of nanoseconds on one input but take many milliseconds on the next.

> Using that much time just to verify we rounded correctly - now that is batshit insane!

>

> Wilco

Not necessarily insane.

Consider a very high precision traverse measurement, like one where the actual measurement takes over a year to collect the data. Back in the 80's my father did the elevation benchmark for DuPage County here in Illinois. (NAVD88 datum IIRC the data was recorded in 1985 or thereabouts, but the monuments were not all in place until 1988) Instead of a optical level, like one could see on a construction site, he used a specialized transit with very sensitive bubble levels on the optics.

The procedure went something like this: Record the Temperature , sight the transit on the measurement rod, record the angle on the transit, flip the optics 180° and rotate the transit 180°(A reversal to cancel out any systematic error in the instrument), flip the measuring rod (Another reversal to cancel out any systematic error in the measuring rod instrument), repeat previous steps. Swap the locations of the Transit and the Rod And repeat the same 4 measurements to negate any elevation errors in the Transit setup. Onto the next point. So eight independent measurements per point, last point of each day would be a return to the start of the day to close the traverse, the POB, Point of Beginning.

Since Chicago, like most of the Midwest, is laid out on a cardinal aligned Cartesian grid, we would end up with a lot of angles approaching 90° and 270° for the North South streets, inconveniently near the top and bottom of the sine curve, Similarly East-West and cosine. I suppose we could draw some attention the the similarity of Geodesy and Finite Element Modeling, the triangles in the grid matter.

The end result was an error down in the low parts per million, he was quite proud of it. At the time, I believe it was the highest accuracy elevation benchmark ever done. Why throw out all those measurements and metrology tricks to cancel out error in the instruments for rounding errors in the math?

Topic | Posted By | Date |
---|---|---|

ARM1/ARM2 Alternative? (20/20 Hindsight) | Paul A. Clayton | 2019/06/27 05:47 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Maxwell | 2019/06/27 08:10 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Paul A. Clayton | 2019/06/28 11:44 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | RichardC | 2019/07/03 07:56 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Simon Farnsworth | 2019/07/04 04:09 AM |

DMA | RichardC | 2019/07/04 05:52 AM |

DMA | Simon Farnsworth | 2019/07/04 09:46 AM |

DMA | RichardC | 2019/07/04 10:54 AM |

DMA | anon | 2019/07/04 05:53 PM |

DMA | Simon Farnsworth | 2019/07/05 01:51 AM |

DMA | RichardC | 2019/07/05 08:24 PM |

DMA | Maxwell | 2019/07/04 09:49 AM |

DMA | Howard Chu | 2019/07/04 10:55 AM |

DMA | RichardC | 2019/07/04 11:00 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Etienne | 2019/07/04 08:06 AM |

ok once you have MMU | RichardC | 2019/07/04 08:46 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Etienne | 2019/06/28 01:52 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | jv | 2019/06/28 07:20 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Paul A. Clayton | 2019/06/28 11:44 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | jv | 2019/06/29 03:54 AM |

Freeing the stack pointer | Paul A. Clayton | 2019/06/29 06:32 AM |

PC-relative LD/ST (NT) | vvid | 2019/06/30 10:03 AM |

Freeing the stack pointer | jv | 2019/06/30 11:45 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Ronald Maas | 2019/06/28 09:06 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Paul A. Clayton | 2019/06/28 12:56 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Ronald Maas | 2019/06/28 10:17 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Brett | 2019/06/29 12:39 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Brett | 2019/06/29 01:13 AM |

32-bit Win10 exists (NT) | nobody in particular | 2019/06/29 05:17 PM |

32-bit Win10 exists | Brett | 2019/06/29 06:45 PM |

32-bit Win10 exists | Michael S | 2019/06/30 01:34 AM |

32-bit Win10 exists | Anon3 | 2019/06/30 03:07 AM |

AArch64 is a new ISA | Paul A. Clayton | 2019/06/29 07:23 AM |

AArch64 is a new ISA | rwessel | 2019/06/29 04:00 PM |

AArch64 is a new ISA | Michael S | 2019/06/30 01:40 AM |

Hardware x87? | Gionatan Danti | 2019/06/30 02:22 AM |

Hardware x87? | Michael S | 2019/06/30 03:52 AM |

Hardware x87? | Gionatan Danti | 2019/06/30 06:04 AM |

Hardware x87? | Michael S | 2019/06/30 08:47 AM |

Hardware x87? | Kevin G | 2019/07/01 12:11 PM |

Hardware x87? | anonymou5 | 2019/07/01 07:30 PM |

Hardware x87? | Michael S | 2019/07/02 12:44 AM |

Hardware x87? | Gionatan Danti | 2019/07/02 09:25 AM |

AArch64 is a new ISA | rwessel | 2019/06/30 01:52 PM |

AArch64 is a new ISA | Michael S | 2019/06/30 01:42 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Maynard Handley | 2019/06/29 09:50 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/06/30 01:29 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Wilco | 2019/06/30 03:51 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/06/30 04:22 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Wilco | 2019/06/30 05:27 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/06/30 05:53 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Wilco | 2019/07/02 01:49 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/07/02 04:24 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Wilco | 2019/07/02 05:28 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/07/03 01:37 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Adrian | 2019/07/03 02:45 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Michael S | 2019/07/03 03:01 AM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Montaray Jack | 2019/07/03 12:18 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Montaray Jack | 2019/07/03 01:46 PM |

ARM1/ARM2 Alternative? (20/20 Hindsight) | Montaray Jack | 2019/07/03 02:32 PM |