By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 8, 2014 8:08 pm
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on December 8, 2014 3:47 pm wrote:
>
> There is enough scaling left to turn somewhat better than current high end xeons to client
> CPU:s
Maybe. And probably not. Given the choice between 16 cores and 4, I suspect most people will take 4, and prefer more cache, graphics, and integrated networking etc.
> Especially when stacked dram becomes standard and L3 cache becomes redundant due to
> lower latency on main memory, and there would be enormous bandwidth available to CPU:s.
Again, unlikely. Especially with all those cores. You want the L3 as a synchronization point for all your theoretical parallel programs that would otherwise have totally unacceptable synchronization overheads.
Stacked RAM may help throughput, but it won't be running at CPU frequencies, and latency of communication between cores is pretty primary.
Of course, that all assumes that the parallel solutions exist at all, which is still an open question (aka "not solved in the last few decades, probably not solvable at all")
> On arm front highly parallel world may become reality
Ok, now your'e just veering off into la-la-land.
Are you talking about that Cavium chip? The one that will be completely useless for general-purpose loads? That one?
Yeah.
> on Intel world its more like having
> 16 core client where each core is slightly MORE powerful than current core , and servers
> probably have 60 of same cores but having lower base frequency and higher power budget.
I can imagine people actually using 60 cores in the server space, yes. I don't think we'll necessarily see it happen on a huge scale, though. It's probably more effective to make bigger caches and integrate more of the IO on the server side too.
On the client side, there are certainly still workstation loads etc that can use 16 cores, and I guess graphics professionals will be able to do their photoshop and video editing faster. But that's a pretty small market in the big picture. There's a reason why desktops are actually shrinking.
So the bulk of the market is probably more in that "four cores and lots of integration, and make it cheap and low-power" market.
But hey, predicting is hard. Especially the future. We'll see.
> I'm talking about diminishing return's on going beyond where those complex OoO
> cores are right now. And there is enough scaling left to go for 16+ cores of
> similar complexity of current high end cores with added wider vector unit.
.. and you are then completely ignoring the diminishing returns of parallelism. There are serious issues going past scaling past a handful or cores. And it's expensive as hell.
Some things are cheap and easy. But those are done already. Building stuff in parallel? Trivial - except even there people tend to have build tools and Makefiles etc that make it not work that well in many cases. So even something really simple like a software workstation often doesn't scale that well (I'm happy to say that the kernel build scales exceptionally well, but most projects don't have tens of thousands of files and lots of effort on the build system.
The "more parallelism" people aren't new. This has been debated for decades, and the extreme parallelism people have been wrong for decades. I don't see anything that has really changed things. And if anything, scaling limits will take some of their arguments away.
I work on a project where we're doing extreme scaling. We're proud of it. But I also see the pain. For us, it makes sense to spend the effort. I don't see that being true in 99% of all cases.
Linus
>
> There is enough scaling left to turn somewhat better than current high end xeons to client
> CPU:s
Maybe. And probably not. Given the choice between 16 cores and 4, I suspect most people will take 4, and prefer more cache, graphics, and integrated networking etc.
> Especially when stacked dram becomes standard and L3 cache becomes redundant due to
> lower latency on main memory, and there would be enormous bandwidth available to CPU:s.
Again, unlikely. Especially with all those cores. You want the L3 as a synchronization point for all your theoretical parallel programs that would otherwise have totally unacceptable synchronization overheads.
Stacked RAM may help throughput, but it won't be running at CPU frequencies, and latency of communication between cores is pretty primary.
Of course, that all assumes that the parallel solutions exist at all, which is still an open question (aka "not solved in the last few decades, probably not solvable at all")
> On arm front highly parallel world may become reality
Ok, now your'e just veering off into la-la-land.
Are you talking about that Cavium chip? The one that will be completely useless for general-purpose loads? That one?
Yeah.
> on Intel world its more like having
> 16 core client where each core is slightly MORE powerful than current core , and servers
> probably have 60 of same cores but having lower base frequency and higher power budget.
I can imagine people actually using 60 cores in the server space, yes. I don't think we'll necessarily see it happen on a huge scale, though. It's probably more effective to make bigger caches and integrate more of the IO on the server side too.
On the client side, there are certainly still workstation loads etc that can use 16 cores, and I guess graphics professionals will be able to do their photoshop and video editing faster. But that's a pretty small market in the big picture. There's a reason why desktops are actually shrinking.
So the bulk of the market is probably more in that "four cores and lots of integration, and make it cheap and low-power" market.
But hey, predicting is hard. Especially the future. We'll see.
> I'm talking about diminishing return's on going beyond where those complex OoO
> cores are right now. And there is enough scaling left to go for 16+ cores of
> similar complexity of current high end cores with added wider vector unit.
.. and you are then completely ignoring the diminishing returns of parallelism. There are serious issues going past scaling past a handful or cores. And it's expensive as hell.
Some things are cheap and easy. But those are done already. Building stuff in parallel? Trivial - except even there people tend to have build tools and Makefiles etc that make it not work that well in many cases. So even something really simple like a software workstation often doesn't scale that well (I'm happy to say that the kernel build scales exceptionally well, but most projects don't have tens of thousands of files and lots of effort on the build system.
The "more parallelism" people aren't new. This has been debated for decades, and the extreme parallelism people have been wrong for decades. I don't see anything that has really changed things. And if anything, scaling limits will take some of their arguments away.
I work on a project where we're doing extreme scaling. We're proud of it. But I also see the pain. For us, it makes sense to spend the effort. I don't see that being true in 99% of all cases.
Linus