By: Jouni Osmala (josmala.delete@this.cc.hut.fi), May 11, 2013 10:04 pm
Room: Moderated Discussions
> > SMT is not really a compromise between client vs server, but on application types.
> >
> > Modern OoO CPU cores have massive execution resources to squeeze
> > out every last inch of single thread performance.
>
> Up to a point. But if you take out the extra logic and registers
> needed to support SMT, you'd be able to clock the core a little faster.
> Maybe not *much* faster, but a little. SMT can't possibly be free.
> And some workloads don't benefit from it.
And MOST workloads don't really benefit from extra clock speed. 99% of the time my core i7 is really quite idle. What really matters is when I need performance does that software benefit SMT, and for me it often does. As average IPC is closer to 1 than 3, there are plenty of idle resources on high end cores that help only tiny amount without SMT. There is benefit for single thread from going to 2 wide ->4 wide, but its far less than increase in execution resources, and SMT is useful for capturing that difference when there is enough parallerism available. The SMT was invented(or copied from university don't know) by alpha engineers to put far more execution resources to a core than single threaded performance increase alone would of warranted, and those resources still improve single threaded performance but not enough for considering doubling number of cores instead. I doubt that haswell would of gone from 6->8 wide back end without SMT. Sure it benefits single threaded performance little but cost is more but in SMT it benefits more and becomes reasonable improvement.
As for majority of people, they don't buy high end model, and if there wouldn't be multi threaded difference between high end and low end there would be single threaded difference. So majority of people are better with SMT existing as it gives intel a way to capture extra cash from those willing to buy top notch multithreaded performance, while not sacrificing too much of single threaded performance from the low end. Now out of that minority who would be willing to spend money on 300$ CPU what is the workloads they are running that would require more performance from it, at for that SMT becomes reasonable choice, not for 90% of time when CPU is waiting for user input and reducing its own clock speed to conserve power. I don't have to consider 90% of programs I use to buy a computer, I have to consider that the ones that I have to wait or feel too slow and their importance to me.
But lets consider the intel evolution of SMT. P4 was single core CPU with quite narrow design, SMT didn't really improve performance on multithreaded but it made system more responsive. ->Core 2 it didn't have SMT probably for time to market reasons -> i7 brought back SMT with real performance benefit on multi threaded since it was far wider core than P4. Responsiveness benefits where not there since it already had multiple cores.
ATOM becomes, its single core narrow. inorder machine, SMT benefits because of inorder nature the execution resources sit most of the time idle AND its single core which makes responsiveness benefits also. The new generation of atoms ARE not SMT because its narrow, multicore OoO machine, with less execution resources idle.
SMT benefits when there are extra resources available and those extra execution resources also make percentage wise the cost of adding SMT go down. So multitheading is really for three cases, 1) single core OR 2) Inorder machine, OR 3) So wide OoO machine that it has more execution resources than it can effectively use.
P4 was 1, original atom was 1 and 2. i7 is 3.
> >
> > Modern OoO CPU cores have massive execution resources to squeeze
> > out every last inch of single thread performance.
>
> Up to a point. But if you take out the extra logic and registers
> needed to support SMT, you'd be able to clock the core a little faster.
> Maybe not *much* faster, but a little. SMT can't possibly be free.
> And some workloads don't benefit from it.
And MOST workloads don't really benefit from extra clock speed. 99% of the time my core i7 is really quite idle. What really matters is when I need performance does that software benefit SMT, and for me it often does. As average IPC is closer to 1 than 3, there are plenty of idle resources on high end cores that help only tiny amount without SMT. There is benefit for single thread from going to 2 wide ->4 wide, but its far less than increase in execution resources, and SMT is useful for capturing that difference when there is enough parallerism available. The SMT was invented(or copied from university don't know) by alpha engineers to put far more execution resources to a core than single threaded performance increase alone would of warranted, and those resources still improve single threaded performance but not enough for considering doubling number of cores instead. I doubt that haswell would of gone from 6->8 wide back end without SMT. Sure it benefits single threaded performance little but cost is more but in SMT it benefits more and becomes reasonable improvement.
As for majority of people, they don't buy high end model, and if there wouldn't be multi threaded difference between high end and low end there would be single threaded difference. So majority of people are better with SMT existing as it gives intel a way to capture extra cash from those willing to buy top notch multithreaded performance, while not sacrificing too much of single threaded performance from the low end. Now out of that minority who would be willing to spend money on 300$ CPU what is the workloads they are running that would require more performance from it, at for that SMT becomes reasonable choice, not for 90% of time when CPU is waiting for user input and reducing its own clock speed to conserve power. I don't have to consider 90% of programs I use to buy a computer, I have to consider that the ones that I have to wait or feel too slow and their importance to me.
But lets consider the intel evolution of SMT. P4 was single core CPU with quite narrow design, SMT didn't really improve performance on multithreaded but it made system more responsive. ->Core 2 it didn't have SMT probably for time to market reasons -> i7 brought back SMT with real performance benefit on multi threaded since it was far wider core than P4. Responsiveness benefits where not there since it already had multiple cores.
ATOM becomes, its single core narrow. inorder machine, SMT benefits because of inorder nature the execution resources sit most of the time idle AND its single core which makes responsiveness benefits also. The new generation of atoms ARE not SMT because its narrow, multicore OoO machine, with less execution resources idle.
SMT benefits when there are extra resources available and those extra execution resources also make percentage wise the cost of adding SMT go down. So multitheading is really for three cases, 1) single core OR 2) Inorder machine, OR 3) So wide OoO machine that it has more execution resources than it can effectively use.
P4 was 1, original atom was 1 and 2. i7 is 3.