Megol ( on 4/9/11 wrote:
>Vincent Diepeveen ( on 4/7/11 wrote:
>>none ( on 4/6/11 wrote:
>>>Vincent Diepeveen ( on 4/6/11 wrote:
>>>>A 64 x 64 giving a 128 bits output multiplication in GCC and similar linux compilers
>>>>requires several dozens of instructions, whereas i was interested only in that multiplication instruction.
>>>>Visual studio with intrinsic requires 10 instructions.
>>>>In Assembler i'd use just 1 instruction there.
>>>gcc only needs one instruction too, using the right extension.
>>>I guess you have to learn to properly use gcc...
>>>#include stdint.h
>>>__uint128_t p(uint64_t a, uint64_t b)
>>>return (__uint128_t)a * b;
>>>$ gcc -O3 -S test.c
>>>movq %rsi, %rax
>>>mulq %rdi
>>I'm using this already you jerk, the problem is if you go do something with the
>>outputs, then GCC generates zillions of nonsense instructions that are not needed.
>Jerk? Grow up, take your medication and stop living in your little fantasy world where you dictate the reality.

If you really believe that a few amateurs are not easy to manipulate by companies that have billion dollar reasons to seem fast, where the easiest way to accomplish that is by keeping GCC slow, then you are not very mature.

>Show an example where this actually happens, if you are

I posted already a clear case.

If you scroll back as well you'll see that for example for years GCC didn't use CMOV type optimizations where obviously generating silly branches would slow it down.

>correct the example should be very short (if your code base requires lots of code

This is a very silly statement and shows your naivity.

The easiest form of sabotage is introducing burocracy which goes wrong when there is a lot of paperwork to do.

As for the explanation why GCC didn't generate CMOV's, not even on his core2 nor AMD phenom, one of the GCC team guys gave to Linus the explanation that it would be slower on his P4...

GCC in general focks up in examples that are more than a few lines.

Another clear form of sabotage is the PGO code. There was a few experimental snapshots in fact where it did seem to work ok, then within a few weeks some people added 'changes' causing it to malfunction again, after which the speedup it gave went down from 20% for my code to 1.5%.

Every other compiler that has pgo, profits at least 20% or more from pgo.

When doing pgo run at 1 core, my chessprogram is a simple multithreaded design with 1 thread that basically idles and the 2nd thread that is doing the computation.

That this gives a speedup of just 1.5% to GCC, is just the best proof ever of taking care GCC doesn't perform like other compilers.

This is a crucial speedup it misses of nearly 20% there.

20% is really *a lot*.

Just a few snapshots out of 2005 seem to give a 20% speedup with pgo, The releases however never worked well there and at most give a 1.5% speedup by means of pgo.

Last measurement i did do with pgo and gcc was in 2010 and pgo still did malfunction.

>to produce this pessimation the problem is elsewhere).

The problem is not elsewhere, the problem is in GCC.

If i would want to sabotage GCC to slow it down at AMD i would do it in the next manner.

Create complicated optimizations which you run and rewrite the code in a manner that they can't get touched by pgo (as pgo might rewrite it to CMOV's).

Take care it is outside the lookahead range of AMD, as that'll give a massive penalty at AMD when jumping.

Now generic sabotage.

Create complicated optimizations which give a speedup in 1 or 2 cases, preferably SPECINT, and make sure at the huge codebase the big company has sponsoring you, that you statistically are sure it slows down on other code.

Because if yuo have something that in generic slows down and only in 1 specific case gives a speedup, yet you take care the optimization gets triggered also in the generic case, then a number of such optimizations will always slow you down.

If you add a few dozen of such complicated optimizations which get triggered in a generic manner then that slowly will do what you want.

You don't want to slowdown 5 line code programs, as the unpaid dudes will at most test 5 line proggies; the perfect sabotage is creating a burocratic set of rules that is so complex that most won't even take a look at them.

As this is of course too complicated domain for you, you realize that you can parameter tune things and calculate break even points for everything?

So the crucial manner to optimize is using your pgo efficiently and effectively and let a parameter optimization determine which optimization to use when.

That will boost the PGO major league for bigger programs.

Now fill in the opposite as i did above and you'll realize how to sabotage in the perfect manner.

Good sabotage and incompetence by stubborn people is impossible to distinguish from each other.

>>As for GCC, I've already posted something before here on inefficiency with assembler
>>output attached and clear proof of sabotage of the GCC compiler in order to keep it slow.
>GCC doesn't have some saboteurs trying to make things slow.
>>GCC isn't getting sold and therefore always will be kept slow.
>"kept slow"... The development model for GCC is open. Nobody contributing to GCC
>have an interest to keep it slow and those that don't contribute have no say.
