Performance counter descriptions

By: chester lam (lamchester.delete.delete@this.this.gmail.com), May 9, 2019 5:42 pm
Room: Moderated Discussions
> You are confusing two issues here.
> Renaming say FPCW is trivial, like any other renaming. BUT FPCW is not a good
> in and of itself; it has value only insofar as it modifies FPO instructions.
> Which FP instructions? Those that occur between FPCW state changes.
> So how do you handle this?
What two issues?

Originally I was curious about the statement:

any* u-arch structure got empty (like INT/SIMD FreeLists).c. FPU control word (FPCW), MXCSR.and others

and was rather thrown off by why FPCW was referenced as a "structure that got empty".

How exactly microarchitectures handle FPCW and optimize around it is a tangential issue, but I find that interesting nonetheless :)

> Well, one way is you treat FPCW like any other register, so it's one more input into every
> instruction, and your apparently 3-input FMA instruction is actually 4-inputs. That's nice
> and conceptually clean, but seems like it would suck from every other point of view.
That's what I meant by treating FPCW as an input dependency. Why does it suck from every other point of view? If renaming lets you keep several copies of FPCW around (up to four copies on the Core Duo/Solo), why not use them?

> Alternatively you treat an FPCW as a fence, as I suggested.
> Alternative 3 is, again as I suggested, that you attach as many of the FPCW bits as
> change often to each FO instruction, and treat changes to the other bits as fences.
A bit like branch predication?

That approach sounds reasonable, but x86 afaik only lets you write to the entire register. Take a look at https://golang.org/src/math/floor_386.s as an arbitrary example. It does this for floor/ceil/trunc:
-save the current FPCW register contents to memory
-put together a new FPCW that has what it wants and write that to memory
-load the new FPCW into the FPCW register
-do stuff
-load the old FPCW back into the FPCW register

Now you'd need hardware that compares the new FPCW with the old one to determine what bits have changed. Before you can do that, you need to know the new FPCW's value (~5 dependent instructions). Once you do know what's changed, you have to go back and apply that to every FP operation coming down the pipeline before the next FPCW register write. That sounds like a complicated nightmare.

> the FPU has to, in some fashion, "reconfigure" itself, and this may require at the very least something
> like a draining of the pipeline before that reconfig can be implemented?
>
Isn't performance the point of renaming/keeping multiple FPCW/MXCSR copies around? If you're going to flush the pipeline, treat FPCW/MXCSR writes as a fence, or do something similar that exacts a heavy performance penalty, why bother renaming it?
< Previous Post in Thread 
TopicPosted ByDate
Performance counter descriptionsTravis Downs2019/05/06 09:24 AM
  Performance counter descriptionschester lam2019/05/06 02:29 PM
    Performance counter descriptionsMaynard Handley2019/05/06 06:17 PM
      Performance counter descriptionschester lam2019/05/09 02:44 PM
        Performance counter descriptionsMaynard Handley2019/05/09 04:01 PM
          Performance counter descriptionschester lam2019/05/09 05:42 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?