By: Linus Torvalds (, January 23, 2020 5:33 pm
Room: Moderated Discussions
Travis Downs ( on January 23, 2020 12:51 am wrote:
> Yeah, it is, see this example.

Ouch. Both compilers do some odd stupid things.

clang seems to do much nicer register allocation , and avoids unnecessary move instructions. Plus gcc gets so confused about register allocation that it causes stack spills, so the end result is just ugly. clang just does much better.

Maybe the code gcc generates is equally fast (maybe it's not decode limited and the moves just turn to renames and the stack spills end up scheduling fine), but it just looks bad.

But then gcc handles that "nothing to sum" case so much better than clang, noticing that it's just zero, while the clang code there is just silly ("let's explicitly zero all these registers so that we can add them up").

No idea whether the different init sequences (gcc: "zero one register, then move it to the others", clang: "zero all registers with xor") are better or worse, might depend on just uarch details.

