Data-dependent instruction latency

By: Peter E. Fry (pfry.delete@this.tailbone.net), August 5, 2018 9:13 am
Room: Moderated Discussions
Travis (travis.downs.delete@this.gmail.com) on August 4, 2018 3:33 pm wrote:
[...]
> Note that there are a ton of other "quirks", like 4k aliasing, cache associativity, false dependencies on popcnt [...]

Argh! Browser ate my reply. Anyway, that last may explain one mystery. I'll have to test that a bit more.

> If your example is non-proprietary and reduced[1], why not post it here? Maybe someone
> already knows what's going on, and if not it could be a new mystery to solve.

Proprietary? Heh. (I can't imagine why nobody pays me for this stuff.) OK, let's see here...

http://www.tailbone.net/mintest.cpp
http://www.tailbone.net/mintest-gcc.txt
http://www.tailbone.net/mintest-clang.txt
http://www.tailbone.net/mintest-gcc-align16.txt

Cheesy test with cheesier RDTSC timer (hey, it's been worse). Dual array initialization caches array on machines with ASLR so I can get a time. Initialization is nearly identical, but Clang version runs about 3% faster on Sandy Bridge (my Haswell machine is a server, so I don't care to install Clang on it). Oh - compiler options:

GCC: -O2 -falign-functions=32 -falign-loops=32 -fno-align-labels -fno-align-jumps -march=native -Winline
GCC "align16" (to match Clang alignment): -O2 -falign-functions=32 -falign-loops=16 -falign-labels=16 -fno-align-jumps -march=native -Winline
Clang: -target x86_64-pc-windows-gnu -O2

Another minor mystery - Clang performance is down... Wait - solved that one: I'd altered the constraints to allow memory operands on inputs (to the assembler). The links above are register-only - here's the "rm" Clang listing link:

http://www.tailbone.net/mintest-clang-rm.txt

Clang utilizes the memory constraint when it shouldn't; it had no effect on GCC. If you want to look at the "rm" version of the code, just stick ["rm"] in place of ["r"] on the input operand contraints.

Heh: IPv6_Mask_to_Size vs. IPv6_Mask_to_Size_2 gives pretty inconsistent results:
GCC: IPv6_Mask_to_Size_2 is faster
Clang: IPv6_Mask_to_Size is faster
...and BSF still a little faster on both. I need to improve timing consistency, and I'm too cross-eyed to look at it now.

Last comment: Clang implementation of "switch..case"... (If it's faster than GCC for someone, let me know on what hardware.)

Last data:

E:Tx86>gcc --version
gcc (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

E:Tx86>clang --version
clang version 6.0.0 (tags/RELEASE_600/final)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: D:LLVMbin

[...]
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
TIL: simple vs complex addressing is resolved at rename time (probably)Travis2018/08/03 01:34 PM
  TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 01:40 AM
    TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 05:05 AM
      TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 07:00 AM
        TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 08:32 AM
          TIL: simple vs complex addressing is resolved at rename time (probably)foobar2018/08/04 09:48 AM
            TIL: simple vs complex addressing is resolved at rename time (probably)anon2018/08/04 10:19 AM
  Data-dependent instruction latencyPeter E. Fry2018/08/04 07:14 AM
    ... or a compiler optimizing aggressively?Heikki Kultala2018/08/04 08:13 AM
      ... or a compiler optimizing aggressively?Peter E. Fry2018/08/04 08:53 AM
    Data-dependent instruction latencyTravis2018/08/04 03:33 PM
      Data-dependent instruction latencyPeter E. Fry2018/08/05 09:13 AM
        Data-dependent instruction latencyTravis2018/08/05 04:55 PM
          Data-dependent instruction latencyPeter E. Fry2018/08/06 07:34 AM
            Data-dependent instruction latencyTravis2018/08/06 05:10 PM
              Data-dependent instruction latencyPeter E. Fry2018/08/07 07:09 AM
                Data-dependent instruction latencyPeter E. Fry2018/08/07 07:11 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?