By: RichardC (tich.delete@this.pobox.com), May 12, 2013 5:57 am
Room: Moderated Discussions
Heikki Kultala (hkultala.delete@this.sefi.fi) on May 12, 2013 1:02 am wrote:
> Everybody who does 3d gaming is doing compiling. The drivers of the 3d chips
> are compiling high-level shader code to the assembly language of the GPU.
You can google for comparisons of 3D game performance with and without
hyperthreading. For current game engines, it doesn't seem to help significantly:
some games go a few FPS quicker with it, some go quicker without it. Running
at higher clock speed without hyperthreading seems to do better.
As for the use of "compiling" for games, I would suspect that the "compiling"
part happens once when the game starts up, not on the critical path of
rendering each frame. So compiling a few tens of thousands of lines of
shader code isn't performance-critical (and may be just fine with no parallelism
at all).
>
> For gaming, 4 fast ones are better than 8 slow ones. But with symmetric multi-threading,
> you kinda get both. When there are only 4 threads active, those execute very quickly,
> but when another 4 threads appear, those increase the performance.
>
> But without SMT, doing a context switch to execute the fifth thread would cause a major slowdown.
Great. You have a theoretical argument for the usefulness of SMT. But the
reality is that current games don't actually go noticeably faster with SMT.
Given a choice between theory and reality, I'll take reality.
> Often the situation might also be, that the user is having 2-core processor and executing
> a program which uses two threads. Then some background task needs some CPU time. Without
> multi-threading, a slow context witch is needed and executing of another one of the important
> thread halts totally. With muti-threading, it only slows down slightly.
Most dekstop/laptop apps are mostly single-threaded. The situation you describe
is the reason why 1C/2T or 2C/2T feels noticeably more responsive than 1C/1T.
But the workloads where you want 3T are much less frequent; and the workloads where
you want >4T even less so. Which is presumably why Intel hasn't pushed beyond
4C for high-volume markets, even though they clearly have sufficient die area
to do so.
> Wrong. For servers large amount of small cores and heavy caches are much better
Not all server workloads are equal. Single-thread performance still matters
a lot for some workloads.
>
> > so b) they put in SMT, because it's very
> > effective for server workloads. But that doesn't prove that SMT is
> > the optimal choice for desktop/laptop cpu's.
>
> No, they put in SMT because it's almost free performance imorovement, for ALL markets.
I say again, show me a commonly-occurring desktop/laptop workload that gets
a noticeable benefit from SMT. Gaming isn't it.
>
> > I'm not saying the resulting chips are bad; I'm just saying that it
> > would be really interesting to see what Intel's architects could
> > deliver if they made a 4C/4T desktop chip without worrying about
> > server workloads.
>
> They would get a chip that has like 2% better single-thread
> performance, but 25% worse multi-threaded peformance.
The question is whether it's really only 2% or whether it might be 5-10%,
which would be much more interesting.
>
> Big core with SMT is the best solution for workloads which
> are mainly single-threaded but sometimes multi-threaded.
Anyone who wants to go fast buys a 4C these days. So SMT only has a chance
to be helpful if you have a workload which has 5 or more active threads for
a significant fraction of the time. That's a much higher bar to clear.
> Everybody who does 3d gaming is doing compiling. The drivers of the 3d chips
> are compiling high-level shader code to the assembly language of the GPU.
You can google for comparisons of 3D game performance with and without
hyperthreading. For current game engines, it doesn't seem to help significantly:
some games go a few FPS quicker with it, some go quicker without it. Running
at higher clock speed without hyperthreading seems to do better.
As for the use of "compiling" for games, I would suspect that the "compiling"
part happens once when the game starts up, not on the critical path of
rendering each frame. So compiling a few tens of thousands of lines of
shader code isn't performance-critical (and may be just fine with no parallelism
at all).
>
> For gaming, 4 fast ones are better than 8 slow ones. But with symmetric multi-threading,
> you kinda get both. When there are only 4 threads active, those execute very quickly,
> but when another 4 threads appear, those increase the performance.
>
> But without SMT, doing a context switch to execute the fifth thread would cause a major slowdown.
Great. You have a theoretical argument for the usefulness of SMT. But the
reality is that current games don't actually go noticeably faster with SMT.
Given a choice between theory and reality, I'll take reality.
> Often the situation might also be, that the user is having 2-core processor and executing
> a program which uses two threads. Then some background task needs some CPU time. Without
> multi-threading, a slow context witch is needed and executing of another one of the important
> thread halts totally. With muti-threading, it only slows down slightly.
Most dekstop/laptop apps are mostly single-threaded. The situation you describe
is the reason why 1C/2T or 2C/2T feels noticeably more responsive than 1C/1T.
But the workloads where you want 3T are much less frequent; and the workloads where
you want >4T even less so. Which is presumably why Intel hasn't pushed beyond
4C for high-volume markets, even though they clearly have sufficient die area
to do so.
> Wrong. For servers large amount of small cores and heavy caches are much better
Not all server workloads are equal. Single-thread performance still matters
a lot for some workloads.
>
> > so b) they put in SMT, because it's very
> > effective for server workloads. But that doesn't prove that SMT is
> > the optimal choice for desktop/laptop cpu's.
>
> No, they put in SMT because it's almost free performance imorovement, for ALL markets.
I say again, show me a commonly-occurring desktop/laptop workload that gets
a noticeable benefit from SMT. Gaming isn't it.
>
> > I'm not saying the resulting chips are bad; I'm just saying that it
> > would be really interesting to see what Intel's architects could
> > deliver if they made a 4C/4T desktop chip without worrying about
> > server workloads.
>
> They would get a chip that has like 2% better single-thread
> performance, but 25% worse multi-threaded peformance.
The question is whether it's really only 2% or whether it might be 5-10%,
which would be much more interesting.
>
> Big core with SMT is the best solution for workloads which
> are mainly single-threaded but sometimes multi-threaded.
Anyone who wants to go fast buys a 4C these days. So SMT only has a chance
to be helpful if you have a workload which has 5 or more active threads for
a significant fraction of the time. That's a much higher bar to clear.