By: RichardC (tich.delete@this.pobox.com), May 11, 2013 4:39 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 11, 2013 8:07 am wrote:
> SMT is not really a compromise between client vs server, but on application types.
>
> Modern OoO CPU cores have massive execution resources to squeeze
> out every last inch of single thread performance.
Up to a point. But if you take out the extra logic and registers
needed to support SMT, you'd be able to clock the core a little faster.
Maybe not *much* faster, but a little. SMT can't possibly be free.
And some workloads don't benefit from it.
> However, lots of software is, by nature, very low on instruction
> level parallelism and leaves most of those resources unused.
> On servers, the HTML generation engines (ie, the PHP/ASP/whaever
> interpreter) is very dependent on branches and pointer chasing.
Servers have lots of parallelism because they have lots of simultaneous
clients. Client machines don't have a lot of active threads most of
the time. Desktops are definitely more responsive with support for
2 simultaneous hardware threads rather than 1; but beyond that (which
of course you can do easily these days with a dual or quad-core without
SMT) the benefit of more hardware threads is questionable.
> But on clients, applications like compilers, game physics and AI, etc, also have similar issues.
99% of desktop/laptop users are not running compilers at all. As for
gaming, that's also rather a niche, and in any case I'm skeptical about
whether current game engines show much benefit from running on 4C/8T
rather than 4C/4T. Latency matters a lot for gaming, and I'm not at
all sure that 8 slow threads are better than 4 fast ones.
Most desktops/laptops are most running web browsers and office apps
(word processing, spreadsheet etc). Which don't exploit many
threads very effectively, if at all.
> For this type of applications, adding SMT to an OoO core can deliver a
> very big performance improvement with very little area/speed overhead.
I don't dispute that there are some workloads which benefit greatly
from SMT. I just don't think many desktop/laptop systems are running
such workloads frequently.
> Which leads to an interesting situation, on the x86 world.
> The cores with SMT from Intel are also the ones with most execution
> resources and, by far, best single thread performance.
Right. When Intel makes a big power-hungry core, then a) they target
it at servers as well as desktops/laptops, because the high margins
of servers are attractive, so b) they put in SMT, because it's very
effective for server workloads. But that doesn't prove that SMT is
the optimal choice for desktop/laptop cpu's.
I'm not saying the resulting chips are bad; I'm just saying that it
would be really interesting to see what Intel's architects could
deliver if they made a 4C/4T desktop chip without worrying about
server workloads.
> SMT is not really a compromise between client vs server, but on application types.
>
> Modern OoO CPU cores have massive execution resources to squeeze
> out every last inch of single thread performance.
Up to a point. But if you take out the extra logic and registers
needed to support SMT, you'd be able to clock the core a little faster.
Maybe not *much* faster, but a little. SMT can't possibly be free.
And some workloads don't benefit from it.
> However, lots of software is, by nature, very low on instruction
> level parallelism and leaves most of those resources unused.
> On servers, the HTML generation engines (ie, the PHP/ASP/whaever
> interpreter) is very dependent on branches and pointer chasing.
Servers have lots of parallelism because they have lots of simultaneous
clients. Client machines don't have a lot of active threads most of
the time. Desktops are definitely more responsive with support for
2 simultaneous hardware threads rather than 1; but beyond that (which
of course you can do easily these days with a dual or quad-core without
SMT) the benefit of more hardware threads is questionable.
> But on clients, applications like compilers, game physics and AI, etc, also have similar issues.
99% of desktop/laptop users are not running compilers at all. As for
gaming, that's also rather a niche, and in any case I'm skeptical about
whether current game engines show much benefit from running on 4C/8T
rather than 4C/4T. Latency matters a lot for gaming, and I'm not at
all sure that 8 slow threads are better than 4 fast ones.
Most desktops/laptops are most running web browsers and office apps
(word processing, spreadsheet etc). Which don't exploit many
threads very effectively, if at all.
> For this type of applications, adding SMT to an OoO core can deliver a
> very big performance improvement with very little area/speed overhead.
I don't dispute that there are some workloads which benefit greatly
from SMT. I just don't think many desktop/laptop systems are running
such workloads frequently.
> Which leads to an interesting situation, on the x86 world.
> The cores with SMT from Intel are also the ones with most execution
> resources and, by far, best single thread performance.
Right. When Intel makes a big power-hungry core, then a) they target
it at servers as well as desktops/laptops, because the high margins
of servers are attractive, so b) they put in SMT, because it's very
effective for server workloads. But that doesn't prove that SMT is
the optimal choice for desktop/laptop cpu's.
I'm not saying the resulting chips are bad; I'm just saying that it
would be really interesting to see what Intel's architects could
deliver if they made a 4C/4T desktop chip without worrying about
server workloads.