New article: AMD's Jaguar Microarchitecture

Article: AMD's Jaguar Microarchitecture
By: Maynard Handley (name99.delete@this.name99.org), April 5, 2014 9:38 am
Room: Moderated Discussions
UnmaskedUnderflow (whoawhoawhoa.delete@this.whoa.com) on April 4, 2014 11:45 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2014 8:54 am wrote:
> > UnmaskedUnderflow (nope.delete@this.nope.org) on April 4, 2014 7:16 am wrote:
> > >
> > > Software types, pls stop writing denormal code. For posterity. :)
> >
> > No, hardware types, stop sucking at denormals.
> >
> > Here's why:
> >
> > - some code actually does want denormals. It's rare, but it happens. I agree that those rare applications
> > should have to work at it (clear the "flush-to-zero" bit or whatever), but since hardware designers
> > made the default be "do denormals", hw people only have themselves to blame.
> >
> > - but more importantly: it happens much more often by mistake, and if
> > your hardware sucks at it, then your hardware sucks. It's that simple.
> >
> > The "happens by mistake" is because it's really easy to overlook, and things still
> > work. If your hardware model is "it will work, but it is slow", that "it will work"
> > part translates directly to "nobody will notice that we're doing it".
> >
> > So put the blame where it belongs: on hardware that silently does bad things. Don't blame software for not
> > noticing those bad things when the hardware designers worked so hard at making the issue hard to notice!
> >
> > So the fix is simple: don't suck at denormals, and don't blame the wrong party (sw).
> >
> > This is not horribly unlike the unaligned scenario. Yes, unaligned accesses are
> > harder for hardware. And yes, hardware that doesn't do them well is crap.
> >
> > It really is that simple. If you have a fragile path, your hardware is bad and you should feel bad.
> > As with unaligned accesses, being a few cycles slower is fine, because it is a more complicated path.
> > But if it's more than a couple of cycles (say, an internal microfault to microcode), you damn well
> > shouldn't blame software, you should look in the mirror and tell yourself "I'm a bad person".
> >
> > Linus
>
> Ha, fair enough, I'll own my little snark. Since it's you, I think I'll offer my penance in the
> form of a serious response. (I didn't think that this morning I'd be countering Linus...wow)
>
> I can't defend "denormals on by default". For legacy chips like x86/x87, that decision was made long ago
> and kept alive so 30-year old DOS/Fortran programs the govt owns will still work on upgrades with no recompile.
> Denormals in this case require a microtrap so they can 1.) respond to UNmasked specs via the 1985 IEEE-754
> requirement and 2.) to still send an FERR to the southbridge, as original FPs were not part of the main
> cpu. You'd think this and things like A20 bits would be gone by now, but they're not.
>
> FTZ/DAZ were added later. I wish they were on by default. I wish compilers forced them
> on by default. But such it is. Perhaps someone of your reputation could contact the
> ivory tower ISA greybeards and/or compilers to convince them so? I support that.
>
> For those who need denormals (HPC)...IBM makes several chips that support them in-line. The people who
> really need them (physicists, mathematicians, wave equations) know already and choose accordingly.
>
> For those who do them by mistake...it's a shame again the defaults are masked exceptions. If
> anything, the slower perf tells you there's a problem. Worse than the perf is that the math
> is wrong...it is by mere coincidence that a subset of integer math works when doing adds/subs
> in un-type-cast denormals. Anything non-trivial will be plain non-functional. For that, we cannot
> verify software's functionality. Sadly we don't help by not telling you by default.
>
> As for adding a few cycles? Yes, to calculate rounded denormals takes hardware and a few cycles more...if always
> masked. ARM chips are always masked, and their SIMDs automagically FTZ. Thus, when the software functionality
> is incorrect (let's say for a recent example, Geekbench 2 accessing uninitialized memory, as discussed on this
> very site) then a masked ARM chip will do it in a few cycles and an unmasked x86 will take a perf trap. Then
> the masses rejoice at the perf crown of ARM in a cross-compare that people said explicitly to avoid.
>
> For a small chip like that discussed here, denormal support is crippled on purpose. It,
> like any chip design, is a tradeoff. Small chips adding big hardware to denormal support
> is the wrong choice, now and always. Moore's Law -- fast denormals on phones?
>
> Hope that's a reasonable response. I'm a bad person. I'll get back to crippling denormals.
>
> Let me modify my snark. "Software types, pls take exceptions on denormals till hw un-stupids defaults"

If we're going to sling blame around, part of the problem is Java. It's not reasonable to say "Small chips adding big hardware to denormal support is the wrong choice, now and always. Moore's Law -- fast denormals on phones?" as though that were just obvious. While Java does not demand full IEEE compliance, it does demand that deforms work. If you care about being Java validated (and plenty of people did, at least until recently) then you have no choice in the matter.

The best option seems to be what we did with AltiVec which was to have an explicit Java mode-flag and an expectation that by default your app will start in non-Java mode.

However, I'd like to raise the point that there is a LARGER problem here which no-one ever seems to address, which is the problem of modularization and libraries.
The model used by ALL these CPU designers seems to be of monolithic programs, which know the FP flags they want for the entire app, and which is entirely unrealistic. The realistic model is that fragments of code at the function/library level want to control FP flags, while the vast bulk of code could not care less and simply wants "do the fast thing".
Which means that the ideal situation would be something like either
- every FPU instruction encodes within it all the FP flags (this seems unrealistic for a variety of reasons, from performance to instruction length) OR

- CPUs provide a very high performance "swap in/swap out FP state" instruction so that this is just one more of those things that's done on entering/exiting a function and it gets lost in the noise of incrementing the SP, storing the return address, and saving a few registers.
Advanced programming environments would allow this state to be toggled at module entry/exit boundaries rather than per function.

Unfortunately I am aware of no CPU+programming environment that thinks this way. Instead we have the current totally retarded system where any random function along the way can toggle FP state once and everyone else then suffers, OR a function has to swap the state in and out on entry/exit, but the state swapping is uncomfortably slow --- and frequently unnecessary, especially if the next function along is going to do the same damn state swapping.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New article: AMD's Jaguar MicroarchitectureDavid Kanter2014/04/01 12:19 AM
  New article: AMD's Jaguar MicroarchitectureSHK2014/04/01 05:09 AM
    New article: AMD's Jaguar MicroarchitectureJeff Rupley2014/04/01 06:13 PM
      New article: AMD's Jaguar MicroarchitectureSHK2014/04/02 05:45 AM
        CMOV is 3 operand given register renamingPaul A. Clayton2014/04/02 08:11 AM
          CMOV is 3 operand given register renamingSHK2014/04/02 11:17 AM
            Limited operand tags in issue queue entriesPaul A. Clayton2014/04/02 12:32 PM
        New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 11:48 AM
          New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 01:32 PM
  New article: AMD's Jaguar MicroarchitectureGeorge2014/04/01 01:10 PM
  New article: AMD's Jaguar Microarchitecturewillmore2014/04/01 05:37 PM
    New article: AMD's Jaguar Microarchitecturewillmore2014/04/01 06:08 PM
    New article: AMD's Jaguar MicroarchitectureNaN2014/04/02 07:58 AM
      New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/04 06:16 AM
        New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 07:54 AM
          New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/04 10:45 AM
            New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 01:00 PM
              New article: AMD's Jaguar MicroarchitectureNoSpammer2014/04/04 02:15 PM
              New article: AMD's Jaguar MicroarchitectureTREZA2014/04/04 02:18 PM
                New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/04 03:56 PM
                  New article: AMD's Jaguar MicroarchitectureTREZA2014/04/04 04:34 PM
                  New article: AMD's Jaguar MicroarchitectureMichael S2014/04/05 10:02 AM
                  New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/05 05:50 PM
                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 12:22 AM
                    New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 04:29 AM
                      New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/06 06:33 AM
                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 02:12 AM
                          New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 05:58 AM
                        New article: AMD's Jaguar MicroarchitectureEduardoS2014/04/07 03:34 PM
                      New article: AMD's Jaguar Microarchitecturecomputational_scientist2014/04/06 06:53 AM
                      New article: AMD's Jaguar MicroarchitectureMegol2014/04/06 07:21 AM
                        New article: AMD's Jaguar Microarchitecturenone2014/04/06 08:07 AM
                          New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 08:23 AM
                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 01:48 PM
                          New article: AMD's Jaguar MicroarchitectureTREZA2014/04/06 02:47 PM
                            New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 01:34 AM
                              New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 02:27 AM
                                New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 04:39 AM
                                  New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/07 11:26 AM
                                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 12:42 PM
                                    New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 12:50 PM
                                      New article: AMD's Jaguar MicroarchitectureUnmaskedUnderflow2014/04/07 01:11 PM
                                        New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 04:44 PM
                                      New article: AMD's Jaguar MicroarchitectureTREZA2014/04/07 02:38 PM
              denormal on IvyB and HaswellMichael S2014/04/05 09:45 AM
                Forum searchiz2014/04/05 11:54 AM
                denormal on IvyB and HaswellLinus Torvalds2014/04/06 08:55 AM
                  denormal on IvyB and HaswellMichael S2014/04/17 05:43 PM
            New article: AMD's Jaguar Microarchitecturedmcq2014/04/05 05:52 AM
            New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/05 09:38 AM
              New article: AMD's Jaguar MicroarchitectureMichael S2014/04/05 09:59 AM
                New article: AMD's Jaguar MicroarchitectureBrett2014/04/05 11:12 AM
                  New article: AMD's Jaguar MicroarchitectureEduardoS2014/04/05 11:29 AM
                    New article: AMD's Jaguar MicroarchitectureBrett2014/04/05 12:00 PM
                      New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 01:18 AM
                        New article: AMD's Jaguar MicroarchitectureBrett2014/04/06 09:08 AM
                          New article: AMD's Jaguar MicroarchitectureBrett2014/04/06 09:11 AM
                New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/05 05:01 PM
                  New article: AMD's Jaguar MicroarchitectureMichael S2014/04/06 12:50 AM
                    New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/06 02:52 PM
                      New article: AMD's Jaguar MicroarchitectureMichael S2014/04/07 01:20 AM
                        New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/07 09:38 AM
                          New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 09:47 AM
                            New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/07 01:52 PM
                              New article: AMD's Jaguar MicroarchitectureWilco2014/04/07 03:01 PM
                                New article: AMD's Jaguar MicroarchitectureSeni2014/04/08 01:03 PM
                                  New article: AMD's Jaguar MicroarchitectureWilco2014/04/08 01:56 PM
                                    New article: AMD's Jaguar MicroarchitectureMichael S2014/04/08 03:05 PM
                                      New article: AMD's Jaguar MicroarchitectureMaynard Handley2014/04/08 05:55 PM
                                        New article: AMD's Jaguar MicroarchitectureMichael S2014/04/09 12:12 AM
                  New article: AMD's Jaguar MicroarchitectureWilco2014/04/06 03:51 AM
  New article: AMD's Jaguar MicroarchitectureWaltC2014/04/02 12:52 PM
    New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/02 01:25 PM
      New article: AMD's Jaguar Microarchitectureitsmydamnation2014/04/02 11:19 PM
      New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/09 12:44 PM
        New article: AMD's Jaguar MicroarchitectureDavid Kanter2014/04/10 10:24 PM
          New article: AMD's Jaguar Microarchitecturenone2014/04/11 12:49 AM
          New article: AMD's Jaguar MicroarchitectureLinus Torvalds2014/04/11 08:14 AM
    New article: AMD's Jaguar MicroarchitectureRyan Dean2014/04/03 12:04 AM
  New article: AMD's Jaguar MicroarchitecturePaul A. Clayton2014/04/02 04:02 PM
  New article: AMD's Jaguar MicroarchitectureRicky Chan2014/04/03 06:50 AM
    New article: AMD's Jaguar Microarchitecturesomeone2014/04/04 06:18 AM
  New article: AMD's Jaguar Microarchitecturebakaneko2014/04/09 02:08 PM
    New article: AMD's Jaguar MicroarchitectureTREZA2014/04/09 04:34 PM
  Jaguar's detailsHugo Décharnes2014/06/07 03:08 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?