ARM1/ARM2 Alternative? (20/20 Hindsight)

By: Wilco (wilco.dijkstra.delete@this.ntlworld.com), June 30, 2019 5:27 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on June 30, 2019 4:22 am wrote:
> Wilco (wilco.dijkstra.delete@this.ntlworld.com) on June 30, 2019 3:51 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on June 30, 2019 1:29 am wrote:
> > > Ronald Maas (ronaldjmaas.delete@this.gmail.com) on June 28, 2019 10:17 pm wrote:
> > > > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on June 28, 2019 12:56 pm wrote:
> > > > > Ronald Maas (ronaldjmaas.delete@this.gmail.com) on June 28, 2019 9:06 am wrote:
> > > > > > Richard Grisenthwaite (ARM lead architect) put some thoughts on the initial
> > > > > > ARM ISA and impact some of these decisions had on designing later ARM CPUs.
> > > > > >
> > > > > > Starting page 17:
> > > > > >
> > > > > > https://www.eit.lth.se/fileadmin/eit/courses/eitf20/ARM_RG.pdf
> > > > >
> > > > > Thanks! That was an interesting read.
> > > > >
> > > > > I agree with the presenter that people tend to code to the implementation (and "clever solutions
> > > > > find inconvenient truths"), though I think such is unfortunate. I would like to think that no
> > > > > one would exploit the MIPS load delay slot to briefly get an extra register (assuming a cache
> > > > > hit and no interrupt) or even to detect a cache miss, but software developers can be clever.
> > > > >
> > > > > I do think being able to deprecate features is important.
> > > > > In some cases like (properly used) delayed loads and
> > > > > address space truncation, compatibility is not expensive.
> > > > > In some cases one would like to drop a feature or limit
> > > > > it to a certain architectural subfamily. I suspect that assembly
> > > > > level compatibility ("trivial" binary translation)
> > > > > might be a reasonable target. JIT compilers would not be difficult
> > > > > to port (as long as they did not include tightly
> > > > > bound size-related optimizations nor used code as data). Terror of refactoring seems bad.
> > > > >
> > > > > I also agree that trap-and-emulate is not sufficient for
> > > > > compatibility. (Trap-and-patch might be sufficient.)
> > > > >
> > > > > The presentation appears to admit that having PC as a GPR was 'too orthogonal' and
> > > > > that "shifts with all data processing" was probably not a good idea in hindight.
> > > > >
> > > > > I agree that one needs to be careful about "It just falls out of the design", but I am not
> > > > > certain I would go as far as "Fear". Some warts have multiple ways of being handled.
> > > > >
> > > > > The constraints of early implementations do tend to introduce microarchitecture-specific warts. Modern
> > > > > computer architects seem to have less need to chase small
> > > > > savings and can learn from the greater accumulated
> > > > > experience, but there is a tendency toward incremental improvement which leads to cruft.
> > > > >
> > > > > The effect of success is also sometimes not recognized. As Richard Grisenthwaite points out, a successful
> > > > > architecture will tend to develop less coherent aspects
> > > > > as the same architecture is used for differ functions
> > > > > (success leads to availability of hardware and people familiar with the systems, reducing costs, risks,
> > > > > and time to market — making the architecture more attractive even if it is less inherently fit for
> > > > > purpose) and the architecture is extended based on market pressures which won't lead in a single direction
> > > > > (because extreme specialization tends to lead to extinction or at least small population) and yet typically
> > > > > will not express a single vision (like Brooks' view of a good cathedral).
> > > > >
> > > > > Designing for change is important. While interfaces are more difficult
> > > > > to change, one should not (in my opinion) expect eternal interfaces.
> > > > >
> > > >
> > > > If an ISA reaches a certain level of popularity, it becomes in most cases impossible to
> > > > remove undesirable features. Fortunately extending the ISA is an effective mechanism to
> > > > move forward in the regard, as long as the old features still keep working as expected.
> > > >
> > > > > E.g. Intel got rid of the stack based approach for FP when
> > > > > SSE was introduced. And although the x87 instructions
> > > > > must be supported to the end of time, nobody cares much about its performance anymore. So Intel can focus
> > > > > on enhancing the much cleaner SSE / AVX, which nowadays is 10x faster compared to x87.
> > > >
> > >
> > > It depends on what you measure.
> > > SIMD-friendly, FMA-happy, dense - it can be much more than 10x, esp. for single-precision. On the opposite
> > > side, strictly scalar, no FMA, with serial dependencies - 1.2x if you are lucky. On Skylake/SkylakeX it's
> > > relatively easy to construct a case where x87 DP math is faster than SSE/AVX. On Broadwell it was harder.
> >
> > I once compared the x87 exp() assembler implementation in GLIBC vs the generic C implementation - generic
> > C code beat it by more than 6 times. Yet for some odd reason the x87 version is the one used on x86...
> >
> > Wilco
> >
> >
>
> For occasional use x87 could be faster, because it does not suffer from cache misses.
> You can't see the effect with simplistic loop-based comparisons, but fast software
> implementations tends to use big tables that sucks badly when not used in loops.

If rarely used, performance wouldn't matter. The tables for exp() are only 2KB, significantly smaller than other implementations. They are also shared with exp2() and pow().

> What about precision?
> x87 transcendental instructions tends to produce incorrectly rounded double-precision results
> ~ one time in 8000 inputs. With generic C implementation, I'd guess, you are in completely
> different league. More like incorrect rounding one time in 8 inputs. I am not even sure that
> in generic C that beats x87 by factor of 6 (i.e. 11-12 clocks? Surely you are talking about
> reciprocal throughput rather than latency?) it is possible to achieve strict monotonicity.

Recent GLIBC exp() has 0.509ULP worst-case error over the entire range, so >99% results are correctly rounded. Yes it's reciprocal throughput, latency is "only" 3.5 times lower than x87...

> In specific case of exp() the difference in precision is probably not *that* big, because x87 does
> not have exactly that function implemented in hardware. One has to synthesize exp() from F2XM1,
> loosing part of precision advantage. But for other functions and for some users (certainly a small
> minority, but not necessarily unimportant small minority) the difference can be important.

There are also major accuracy issues with many x87 instructions when you are outside a certain input range. I don't believe FSIN/FCOS range reduction bugs were ever fixed.

Wilco

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM1/ARM2 Alternative? (20/20 Hindsight)Paul A. Clayton2019/06/27 05:47 PM
  ARM1/ARM2 Alternative? (20/20 Hindsight)Maxwell2019/06/27 08:10 PM
    ARM1/ARM2 Alternative? (20/20 Hindsight)Paul A. Clayton2019/06/28 11:44 AM
      ARM1/ARM2 Alternative? (20/20 Hindsight)RichardC2019/07/03 07:56 PM
        ARM1/ARM2 Alternative? (20/20 Hindsight)Simon Farnsworth2019/07/04 04:09 AM
          DMARichardC2019/07/04 05:52 AM
            DMASimon Farnsworth2019/07/04 09:46 AM
              DMARichardC2019/07/04 10:54 AM
                DMAanon2019/07/04 05:53 PM
                  DMASimon Farnsworth2019/07/05 01:51 AM
                  DMARichardC2019/07/05 08:24 PM
            DMAMaxwell2019/07/04 09:49 AM
              DMAHoward Chu2019/07/04 10:55 AM
              DMARichardC2019/07/04 11:00 AM
          ARM1/ARM2 Alternative? (20/20 Hindsight)Etienne2019/07/04 08:06 AM
            ok once you have MMURichardC2019/07/04 08:46 AM
  ARM1/ARM2 Alternative? (20/20 Hindsight)Etienne2019/06/28 01:52 AM
  ARM1/ARM2 Alternative? (20/20 Hindsight)jv2019/06/28 07:20 AM
    ARM1/ARM2 Alternative? (20/20 Hindsight)Paul A. Clayton2019/06/28 11:44 AM
      ARM1/ARM2 Alternative? (20/20 Hindsight)jv2019/06/29 03:54 AM
        Freeing the stack pointerPaul A. Clayton2019/06/29 06:32 AM
          PC-relative LD/ST (NT)vvid2019/06/30 10:03 AM
          Freeing the stack pointerjv2019/06/30 11:45 PM
  ARM1/ARM2 Alternative? (20/20 Hindsight)Ronald Maas2019/06/28 09:06 AM
    ARM1/ARM2 Alternative? (20/20 Hindsight)Paul A. Clayton2019/06/28 12:56 PM
      ARM1/ARM2 Alternative? (20/20 Hindsight)Ronald Maas2019/06/28 10:17 PM
        ARM1/ARM2 Alternative? (20/20 Hindsight)Brett2019/06/29 12:39 AM
          ARM1/ARM2 Alternative? (20/20 Hindsight)Brett2019/06/29 01:13 AM
          32-bit Win10 exists (NT)nobody in particular2019/06/29 05:17 PM
            32-bit Win10 existsBrett2019/06/29 06:45 PM
              32-bit Win10 existsMichael S2019/06/30 01:34 AM
                32-bit Win10 existsAnon32019/06/30 03:07 AM
        AArch64 is a new ISAPaul A. Clayton2019/06/29 07:23 AM
          AArch64 is a new ISArwessel2019/06/29 04:00 PM
            AArch64 is a new ISAMichael S2019/06/30 01:40 AM
              Hardware x87?Gionatan Danti2019/06/30 02:22 AM
                Hardware x87?Michael S2019/06/30 03:52 AM
                  Hardware x87?Gionatan Danti2019/06/30 06:04 AM
                    Hardware x87?Michael S2019/06/30 08:47 AM
                  Hardware x87?Kevin G2019/07/01 12:11 PM
                    Hardware x87?anonymou52019/07/01 07:30 PM
                      Hardware x87?Michael S2019/07/02 12:44 AM
                      Hardware x87?Gionatan Danti2019/07/02 09:25 AM
              AArch64 is a new ISArwessel2019/06/30 01:52 PM
            AArch64 is a new ISAMichael S2019/06/30 01:42 AM
        ARM1/ARM2 Alternative? (20/20 Hindsight)Maynard Handley2019/06/29 09:50 AM
        ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/06/30 01:29 AM
          ARM1/ARM2 Alternative? (20/20 Hindsight)Wilco2019/06/30 03:51 AM
            ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/06/30 04:22 AM
              ARM1/ARM2 Alternative? (20/20 Hindsight)Wilco2019/06/30 05:27 AM
                ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/06/30 05:53 AM
                  ARM1/ARM2 Alternative? (20/20 Hindsight)Wilco2019/07/02 01:49 AM
                    ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/07/02 04:24 AM
                      ARM1/ARM2 Alternative? (20/20 Hindsight)Wilco2019/07/02 05:28 PM
                        ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/07/03 01:37 AM
                          ARM1/ARM2 Alternative? (20/20 Hindsight)Adrian2019/07/03 02:45 AM
                            ARM1/ARM2 Alternative? (20/20 Hindsight)Michael S2019/07/03 03:01 AM
                            ARM1/ARM2 Alternative? (20/20 Hindsight)Montaray Jack2019/07/03 12:18 PM
                              ARM1/ARM2 Alternative? (20/20 Hindsight)Montaray Jack2019/07/03 01:46 PM
                        ARM1/ARM2 Alternative? (20/20 Hindsight)Montaray Jack2019/07/03 02:32 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?