By: Jouni Osmala (josmala.delete@this.cc.hut.fi), December 8, 2014 1:10 pm
Room: Moderated Discussions
> > > > or people using DAGs to explain why reference count is a solid
> > > > solution when complex object graphs typically have cycles
> > >
> > > Oh, I agree. My example was the simple case. The really complex cases are much worse.
> > >
> > > I seriously don't believe that the future is parallel. People who think you can solve it with compilers
> > > or programming languages (or better programmers) are so far out to lunch that it's not even funny.
> > >
> > > Parallelism works well in simplified cases with fairly clear interfaces and models. You find
> > > parallelism in servers with independent queries, in HPC, in kernels, in databases. And even
> > > there, people work really hard to make it work at all, and tend to expressly limit their models
> > > to be more amenable to it (eg databases do some things much better than others, so DB admins
> > > make sure that they lay out their data in order to cater to the limitations).
> > >
> > > Of course, other programming models can work. Neural networks are inherently very
> > > parallel indeed. And you don't need smarter programmers to program them either..
> > >
> > > Linus
> >
> > Future is parallel in one way or another, simply because diminishing returns on improving serial
>
> This always starts off a heated debate every time it comes up.
>
> But here goes again: *There are diminishing returns on improving parallel*.
> Assuming we're talking about client computing here (Linus already ruled out HPC and databases).
> Why would you assume the inefficiencies of parallel model will be solved, but the serial
> ones will not? I can understand being pessimistic on both, or optimistic on both, but
> why different positions on each one? It's not like parallel computing is a relatively
> new unexplored garden of low hanging fruit and breakthroughs waiting to happen
I'm assuming that 90+% of programs already run fast enough and they don't matter for this.
Its all about asking question in what use current computers are too slow , and can you parallerize that or are those cases already parallel. And I'm assuming you can parallerize atleast 10% of those times where user waits CPU for long enough to actually notice it.
> Client computing has gone from 1 to 2-4 cores in the last 10 years. Perhaps 6-8 at the very high
> end and then double thread count for SMT, but based on where the vast majority of devices are, and
> the applications being run on them, it would be a stretch to say that multicore has increased the
> acutal realized client CPU performance by more than 4x. On average I would say significantly less
> than 2x would be more likely, due to not having parallel work available much of the time.
>
> In the same time, it looks like single thread performance has increased roughly between 2-4x (Athlon
> XP 3200+ to today's Core i7). Again there are also outliers like vectorizable FP and some memory bandwidth
> bound cases where performance is closer to 10x, but we'll also discard those. But the thing about this
> performance increase is that it's *usable* for all CPU work, not just parallel work.
Its more like would you rather have todays quad i7 or 10-20% faster single core? The power management is that good for turbo mode that you really cannot expect to be much faster for limiting number of cores. And other question is would you rather have in the future 16 cores or 4 cores that are 20% faster? That's what that all the time shrinking and following the trends will give later trade offs, of course there will be big improvement with on chip memory on single thread performance but that same improvement will give even bigger improvement on parallel and isn't either that or parallel but both, but kind of improvements that allow you to choose between doubling number of cores and adding that amount of resources per core are limited by both diminishing returns and the latency that comes from having bigger structures, think about P4 prescott in terms of branch missprediction penalty on that one, and think how fast that would be in random spaghetti code, it would be fast in vectorizable straight line code and that would even better be served by wider SIMD or more cores.
You normalize too early. First i7 was first that combined most of the already known ways to make things faster, after that it's been 10% per generation from tweaking. And latest tweaking included widening the core from 6-8 wide and lots of other improvements and still 10% and at same time power per core has gone down significantly which allows adding more cores but Intel chose integrating GPU and reducing power consumption and cooling requirements.
As for what has done with client everyone seems to be integrating the ultimate parallel accelerator (GPU) to CPU, and add wider vectors, which is related to my claim that future is parallel just because its only way for serious gains as long as we are working with current physics.
> > > > solution when complex object graphs typically have cycles
> > >
> > > Oh, I agree. My example was the simple case. The really complex cases are much worse.
> > >
> > > I seriously don't believe that the future is parallel. People who think you can solve it with compilers
> > > or programming languages (or better programmers) are so far out to lunch that it's not even funny.
> > >
> > > Parallelism works well in simplified cases with fairly clear interfaces and models. You find
> > > parallelism in servers with independent queries, in HPC, in kernels, in databases. And even
> > > there, people work really hard to make it work at all, and tend to expressly limit their models
> > > to be more amenable to it (eg databases do some things much better than others, so DB admins
> > > make sure that they lay out their data in order to cater to the limitations).
> > >
> > > Of course, other programming models can work. Neural networks are inherently very
> > > parallel indeed. And you don't need smarter programmers to program them either..
> > >
> > > Linus
> >
> > Future is parallel in one way or another, simply because diminishing returns on improving serial
>
> This always starts off a heated debate every time it comes up.
>
> But here goes again: *There are diminishing returns on improving parallel*.
> Assuming we're talking about client computing here (Linus already ruled out HPC and databases).
> Why would you assume the inefficiencies of parallel model will be solved, but the serial
> ones will not? I can understand being pessimistic on both, or optimistic on both, but
> why different positions on each one? It's not like parallel computing is a relatively
> new unexplored garden of low hanging fruit and breakthroughs waiting to happen
I'm assuming that 90+% of programs already run fast enough and they don't matter for this.
Its all about asking question in what use current computers are too slow , and can you parallerize that or are those cases already parallel. And I'm assuming you can parallerize atleast 10% of those times where user waits CPU for long enough to actually notice it.
> Client computing has gone from 1 to 2-4 cores in the last 10 years. Perhaps 6-8 at the very high
> end and then double thread count for SMT, but based on where the vast majority of devices are, and
> the applications being run on them, it would be a stretch to say that multicore has increased the
> acutal realized client CPU performance by more than 4x. On average I would say significantly less
> than 2x would be more likely, due to not having parallel work available much of the time.
>
> In the same time, it looks like single thread performance has increased roughly between 2-4x (Athlon
> XP 3200+ to today's Core i7). Again there are also outliers like vectorizable FP and some memory bandwidth
> bound cases where performance is closer to 10x, but we'll also discard those. But the thing about this
> performance increase is that it's *usable* for all CPU work, not just parallel work.
Its more like would you rather have todays quad i7 or 10-20% faster single core? The power management is that good for turbo mode that you really cannot expect to be much faster for limiting number of cores. And other question is would you rather have in the future 16 cores or 4 cores that are 20% faster? That's what that all the time shrinking and following the trends will give later trade offs, of course there will be big improvement with on chip memory on single thread performance but that same improvement will give even bigger improvement on parallel and isn't either that or parallel but both, but kind of improvements that allow you to choose between doubling number of cores and adding that amount of resources per core are limited by both diminishing returns and the latency that comes from having bigger structures, think about P4 prescott in terms of branch missprediction penalty on that one, and think how fast that would be in random spaghetti code, it would be fast in vectorizable straight line code and that would even better be served by wider SIMD or more cores.
You normalize too early. First i7 was first that combined most of the already known ways to make things faster, after that it's been 10% per generation from tweaking. And latest tweaking included widening the core from 6-8 wide and lots of other improvements and still 10% and at same time power per core has gone down significantly which allows adding more cores but Intel chose integrating GPU and reducing power consumption and cooling requirements.
As for what has done with client everyone seems to be integrating the ultimate parallel accelerator (GPU) to CPU, and add wider vectors, which is related to my claim that future is parallel just because its only way for serious gains as long as we are working with current physics.