By: Maynard Handley (name99.delete@this.name99.org), December 9, 2014 11:33 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 8, 2014 8:08 pm wrote:
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on December 8, 2014 3:47 pm wrote:
> >
> > There is enough scaling left to turn somewhat better than current high end xeons to client
> > CPU:s
>
> Maybe. And probably not. Given the choice between 16 cores and 4, I suspect most people
> will take 4, and prefer more cache, graphics, and integrated networking etc.
>
> > Especially when stacked dram becomes standard and L3 cache becomes redundant due to
> > lower latency on main memory, and there would be enormous bandwidth available to CPU:s.
>
> Again, unlikely. Especially with all those cores. You want the L3 as a synchronization point for all your
> theoretical parallel programs that would otherwise have totally unacceptable synchronization overheads.
>
> Stacked RAM may help throughput, but it won't be running at CPU frequencies,
> and latency of communication between cores is pretty primary.
>
> Of course, that all assumes that the parallel solutions exist at all, which is still an
> open question (aka "not solved in the last few decades, probably not solvable at all")
Linus, might I respectfully suggest that you are blinded by two points:
(a) The space you work in strives for the absolute maximum performance possible. But a consequence of this is that parallel for you IS really hard, when your life is a constant stream of trying to reason about RCU on different architectures with different memory model, trying to figure out just how aggressively you can reorder some instructions to gain an extra .1% of throughput, how to create new lock-free data structures in the face of the limited guarantees different HW provides, etc.
But that is NOT parallel programming for most programmers. They are/will be using canned data structures and algorithms, and compiler and language support. Yes, this route may leave 10 or 15% of possible performance on the table. But if it allows the code to run 2x faster on a quad-core CPU, that's still progress.
(b) You are probably unrealistic about the speed at which change does (or does not) happen as you move from smaller more controlled environments (CPU) to larger (OS frameworks and APIs) to larger (compilers, changes to existing languages) to largest of all (new languages, which need to be learned, evangelized, have their infrastructure written, have their texbooks written, have to move into the college curriculum).
There may well be severe limits to what we can do if we insist on writing every parallel program in K&R C with pthreads. But we are slowly fumbling our way to better abstractions. Blocks/lambdas/futures are still only a few years old, and where they have been retrofitted to existing languages the edges are still pretty obvious (horribly so in the case of C++, just ugly in the case of Objective C or C#).
Relying on mutability as an important organizing principle for structuring, understanding, and performance, is even more leading edge. Actors and most functional programming concepts are even on the radar of the C-derived languages.
Fifty years ago, stacks and pointers were a leading edge concept, and the mainstream programming language (Fortran) provided no mechanisms that relied on them --- so no recursion, no sophisticated data structures. Today these concepts are taken for granted --- stacks are just invisible in the background of how function calls work, pointers have become invisible except for those working in C and C++.
Thirty years ago it was common (REALLY COMMON) practice to write code with a mess of globals. That was just how things were done by most people --- create this huge pool of globals at the top of a file and then have every function rummage around in it, rather than passing explicit variables/structures/objects between functions. I began programming at the tail-end of this, so I've no idea exactly what the driving force of this was --- maybe it was considered easier to debug, or more convenient (because you didn't have to alter the prototypes of a chain of functions in order to pass in a new parameter), or maybe it was considered (and actually was?) faster because function parameter marshaling was slow on those CPUs?
Point is anybody who sees this today would be horrified by that style of code; but it took 20+ years for it to go away in the early 2000s or so, and I'm guessing there are plenty of places where it still lives on, in people learning programming today by looking at older code and older books, who at some point will have to (hopefully) re-educated by their teachers and colleagues.
We have not even really started on this re-education of programmers so that they do things in a better way (where "better" means using abstractions that are a better match for parallel programming). Our languages, APIs, and tools are still in an abysmal state, like we're using Fortran and so trying to force recursion or pointers into the language is like pulling teeth. Our tools don't have the sort of refactoring that today makes us not care about needing to add a new parameter to a chain of function calls. etc etc.
It MAY be true that parallelism is a dead end. But claiming so on the basis that programming the Linux kernel is crazy hard, that C++14 is a godawful mess that pretty much no human being fully understands, and that the OSX/Windows APIs have piecemeal support for concurrency strikes me as premature. These things take time --- and the time scale may well be thirty years, where the clock really only started ticking in about 2004.
> Jouni Osmala (josmala.delete@this.cc.hut.fi) on December 8, 2014 3:47 pm wrote:
> >
> > There is enough scaling left to turn somewhat better than current high end xeons to client
> > CPU:s
>
> Maybe. And probably not. Given the choice between 16 cores and 4, I suspect most people
> will take 4, and prefer more cache, graphics, and integrated networking etc.
>
> > Especially when stacked dram becomes standard and L3 cache becomes redundant due to
> > lower latency on main memory, and there would be enormous bandwidth available to CPU:s.
>
> Again, unlikely. Especially with all those cores. You want the L3 as a synchronization point for all your
> theoretical parallel programs that would otherwise have totally unacceptable synchronization overheads.
>
> Stacked RAM may help throughput, but it won't be running at CPU frequencies,
> and latency of communication between cores is pretty primary.
>
> Of course, that all assumes that the parallel solutions exist at all, which is still an
> open question (aka "not solved in the last few decades, probably not solvable at all")
Linus, might I respectfully suggest that you are blinded by two points:
(a) The space you work in strives for the absolute maximum performance possible. But a consequence of this is that parallel for you IS really hard, when your life is a constant stream of trying to reason about RCU on different architectures with different memory model, trying to figure out just how aggressively you can reorder some instructions to gain an extra .1% of throughput, how to create new lock-free data structures in the face of the limited guarantees different HW provides, etc.
But that is NOT parallel programming for most programmers. They are/will be using canned data structures and algorithms, and compiler and language support. Yes, this route may leave 10 or 15% of possible performance on the table. But if it allows the code to run 2x faster on a quad-core CPU, that's still progress.
(b) You are probably unrealistic about the speed at which change does (or does not) happen as you move from smaller more controlled environments (CPU) to larger (OS frameworks and APIs) to larger (compilers, changes to existing languages) to largest of all (new languages, which need to be learned, evangelized, have their infrastructure written, have their texbooks written, have to move into the college curriculum).
There may well be severe limits to what we can do if we insist on writing every parallel program in K&R C with pthreads. But we are slowly fumbling our way to better abstractions. Blocks/lambdas/futures are still only a few years old, and where they have been retrofitted to existing languages the edges are still pretty obvious (horribly so in the case of C++, just ugly in the case of Objective C or C#).
Relying on mutability as an important organizing principle for structuring, understanding, and performance, is even more leading edge. Actors and most functional programming concepts are even on the radar of the C-derived languages.
Fifty years ago, stacks and pointers were a leading edge concept, and the mainstream programming language (Fortran) provided no mechanisms that relied on them --- so no recursion, no sophisticated data structures. Today these concepts are taken for granted --- stacks are just invisible in the background of how function calls work, pointers have become invisible except for those working in C and C++.
Thirty years ago it was common (REALLY COMMON) practice to write code with a mess of globals. That was just how things were done by most people --- create this huge pool of globals at the top of a file and then have every function rummage around in it, rather than passing explicit variables/structures/objects between functions. I began programming at the tail-end of this, so I've no idea exactly what the driving force of this was --- maybe it was considered easier to debug, or more convenient (because you didn't have to alter the prototypes of a chain of functions in order to pass in a new parameter), or maybe it was considered (and actually was?) faster because function parameter marshaling was slow on those CPUs?
Point is anybody who sees this today would be horrified by that style of code; but it took 20+ years for it to go away in the early 2000s or so, and I'm guessing there are plenty of places where it still lives on, in people learning programming today by looking at older code and older books, who at some point will have to (hopefully) re-educated by their teachers and colleagues.
We have not even really started on this re-education of programmers so that they do things in a better way (where "better" means using abstractions that are a better match for parallel programming). Our languages, APIs, and tools are still in an abysmal state, like we're using Fortran and so trying to force recursion or pointers into the language is like pulling teeth. Our tools don't have the sort of refactoring that today makes us not care about needing to add a new parameter to a chain of function calls. etc etc.
It MAY be true that parallelism is a dead end. But claiming so on the basis that programming the Linux kernel is crazy hard, that C++14 is a godawful mess that pretty much no human being fully understands, and that the OSX/Windows APIs have piecemeal support for concurrency strikes me as premature. These things take time --- and the time scale may well be thirty years, where the clock really only started ticking in about 2004.