By: Andy N (andyn.delete@this.example.edu), August 12, 2019 5:45 pm
Room: Moderated Discussions
⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on August 12, 2019 1:51 am wrote:
> The main reason why gcc/clang compilation scales well with multiple x86 cores is that it is performing
> a lot of repetitive work. If there was proper caching implemented preventing repetitive work on sub-file
> granularity then compiler jobs wouldn't scale that well, even in case of many-file builds like the Linux
> kernel maybe except for the very first build ever on the machine. Even if it is the very first build
> of the Linux kernel on a machine it is likely that it would be possible to somewhat speedup the build
> and avoid some local work by concurrently downloading data from a world-wide compiler cache.
You're severely underestimating the complexity of compilation. In particular, the variety of input parameters which affect the compiled output, reducing the effectiveness of cacheing.
To start with, target microarchitecture. GCC lets me specify both which platforms the target code will run on (which CPU features the compiled code will use) and which platform the code is optimized for. E.g. on some microarchitectures, decrement and conditional branch instructions are fused, so they should be placed adjacent for best performance. On others, scheduling the decrement well ahead allows the condition codes to be available when the conditional branch is decoded.
There are also non-target code generation options like stack sanity checking.
Third, structure layout. Linux has lots of CONFIG options which add or remove fields from kernel data structures. (The most obvious are various debug options which e.g. record where memory was allocated to print useful information of it's double-freed or leaked.) All of these change the compiled code. Sometimes in
Fourth, function inlining. Even if a particular function's source code doesn't change, it's very common that platform dependencies are hidden inside an inline helper function which does. So the generated object code is very different.
Yes, a compiler is a nice simple model computation where the output is strictly a function of the input, but the input is larger and more varied than you seem to be imagining.
> The main reason why gcc/clang compilation scales well with multiple x86 cores is that it is performing
> a lot of repetitive work. If there was proper caching implemented preventing repetitive work on sub-file
> granularity then compiler jobs wouldn't scale that well, even in case of many-file builds like the Linux
> kernel maybe except for the very first build ever on the machine. Even if it is the very first build
> of the Linux kernel on a machine it is likely that it would be possible to somewhat speedup the build
> and avoid some local work by concurrently downloading data from a world-wide compiler cache.
You're severely underestimating the complexity of compilation. In particular, the variety of input parameters which affect the compiled output, reducing the effectiveness of cacheing.
To start with, target microarchitecture. GCC lets me specify both which platforms the target code will run on (which CPU features the compiled code will use) and which platform the code is optimized for. E.g. on some microarchitectures, decrement and conditional branch instructions are fused, so they should be placed adjacent for best performance. On others, scheduling the decrement well ahead allows the condition codes to be available when the conditional branch is decoded.
There are also non-target code generation options like stack sanity checking.
Third, structure layout. Linux has lots of CONFIG options which add or remove fields from kernel data structures. (The most obvious are various debug options which e.g. record where memory was allocated to print useful information of it's double-freed or leaked.) All of these change the compiled code. Sometimes in
Fourth, function inlining. Even if a particular function's source code doesn't change, it's very common that platform dependencies are hidden inside an inline helper function which does. So the generated object code is very different.
Yes, a compiler is a nice simple model computation where the output is strictly a function of the input, but the input is larger and more varied than you seem to be imagining.