By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), May 15, 2013 12:19 pm
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on May 15, 2013 11:06 am wrote:
> Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 15, 2013 9:16 am wrote:
> > RichardC (tich.delete@this.pobox.com) on May 15, 2013 7:29 am wrote:
> >
> > > Parallel programming isn't "too hard". Parallel programming that scales well
> > > with number of threads when running on hardware with limited memory bandwidth and
> > > with SMT threads that actually slow each other down in unpredictable ways by the
> > > sharing of core execution resources and cache space ? Harder.
> >
> > Except when it comes "for free".
> > It's not common for developers to put much effort into optimizing
> > for SMT, beyond making sure the software works with more threads.
> > By far, the most common case for software that benfits from SMT is multi-threaded
> > software which has low IPC, despite the developers' best efforts to increase it.
>
> If it has low IPC because it does lots of cache misses and is constrained by DRAM
> latency, then SMT *might* help (unless the smaller cache-per-thread reduces the hit
> rate even further ...); if it has low IPC because it's constrained by DRAM *bandwidth*,
> then SMT won't help at all: 4C/4T can swamp the DRAM system just as well as 4C/8T.
Of course.
And how nice it would be if most software was limited by DRAM bandwidth.
That one, we know how to increase!! FB-DIMM, anyone?
Unfortunately, the first kind of software is abundant and it has many friends.
Not only DRAM latency but even L2 and L3 latency can cause the IPC to drop to very low levels (ie, Ivy Bridge's L2 is 12 cycles, L3 is about 24).
And you have control (branches) and data dependencies.
All in all, I'd say multi-threaded applications which see a positive from SMT probably outnumber those who get hurt by an order of magnitude.
> Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 15, 2013 9:16 am wrote:
> > RichardC (tich.delete@this.pobox.com) on May 15, 2013 7:29 am wrote:
> >
> > > Parallel programming isn't "too hard". Parallel programming that scales well
> > > with number of threads when running on hardware with limited memory bandwidth and
> > > with SMT threads that actually slow each other down in unpredictable ways by the
> > > sharing of core execution resources and cache space ? Harder.
> >
> > Except when it comes "for free".
> > It's not common for developers to put much effort into optimizing
> > for SMT, beyond making sure the software works with more threads.
> > By far, the most common case for software that benfits from SMT is multi-threaded
> > software which has low IPC, despite the developers' best efforts to increase it.
>
> If it has low IPC because it does lots of cache misses and is constrained by DRAM
> latency, then SMT *might* help (unless the smaller cache-per-thread reduces the hit
> rate even further ...); if it has low IPC because it's constrained by DRAM *bandwidth*,
> then SMT won't help at all: 4C/4T can swamp the DRAM system just as well as 4C/8T.
Of course.
And how nice it would be if most software was limited by DRAM bandwidth.
That one, we know how to increase!! FB-DIMM, anyone?
Unfortunately, the first kind of software is abundant and it has many friends.
Not only DRAM latency but even L2 and L3 latency can cause the IPC to drop to very low levels (ie, Ivy Bridge's L2 is 12 cycles, L3 is about 24).
And you have control (branches) and data dependencies.
All in all, I'd say multi-threaded applications which see a positive from SMT probably outnumber those who get hurt by an order of magnitude.