... or a compiler optimizing aggressively?

By: Heikki Kultala (heikki.kultala.delete@this.tut.fi), August 4, 2018 8:13 am
Room: Moderated Discussions
Peter E. Fry (pfry.delete@this.tailbone.net) on August 4, 2018 7:14 am wrote:
> Travis (travis.downs.delete@this.gmail.com) on August 3, 2018 1:34 pm wrote:
> [...]
> > There are many instructions that already have different latencies (and uop counts) depending
> > on some value, but the existing examples that I know of are all based on immediate values in
> > the instruction so can be sorted out by the decoders. Examples include adc with a 0 immediate
> > (1 uop vs 2 on SnB to Haswell), certain shift instructions with 0 or 1 immediate, etc.
> >
> > Is this useful information in practice for optimization?
> >
> > Probably not, or only very rarely. [...]
>
> Going back a few years, I had a test case on the AMD K8/K10 where MUL had three distinct latencies: one factor
> = 0 or 1; one factor = power of 2; everything else. I discovered it via some poorly-formed test data (all
> 0s), which made me think I had some magically fast code. My K10 board is on a shelf in the closet, so I can't
> check my sanity at the moment (yes, it is suspect). I don't have any later AMD chips to test.
>
> Are these sorts of things documented in one place somewhere?
>
> Not really related, but I've run clean into the limitations of static analysis (staring at code).
> I have two mysteries (at the moment) (BSF running faster than it should on Haswell; two sets
> of sequences compiled on GCC and Clang with identical instruction counts that run... differently
> than I would expect) - apparently performance counters are the only way these days.

Are you sure it was the processor, not your compiler optimizaing those MUL's away?

Compilers can be _very smart_ nowadays tracking where values come from and replacing big pieces of calculation code with compile-time calculated versions on naiively written benchmark programs.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
TIL: simple vs complex addressing is resolved at rename time (probably)Travis2018/08/03 01:34 PM
  TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 01:40 AM
    TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 05:05 AM
      TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 07:00 AM
        TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 08:32 AM
          TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 09:48 AM
            TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 10:19 AM
  Data-dependent instruction latencyPeter E. Fry2018/08/04 07:14 AM
    ... or a compiler optimizing aggressively?Heikki Kultala2018/08/04 08:13 AM
      ... or a compiler optimizing aggressively?Peter E. Fry2018/08/04 08:53 AM
    Data-dependent instruction latencyTravis2018/08/04 03:33 PM
      Data-dependent instruction latencyPeter E. Fry2018/08/05 09:13 AM
        Data-dependent instruction latencyTravis2018/08/05 04:55 PM
          Data-dependent instruction latencyPeter E. Fry2018/08/06 07:34 AM
            Data-dependent instruction latencyTravis2018/08/06 05:10 PM
              Data-dependent instruction latencyPeter E. Fry2018/08/07 07:09 AM
                Data-dependent instruction latencyPeter E. Fry2018/08/07 07:11 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?