By: RichardC (tich.delete@this.pobox.com), May 15, 2013 11:06 am
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on May 15, 2013 9:16 am wrote:
> RichardC (tich.delete@this.pobox.com) on May 15, 2013 7:29 am wrote:
>
> > Parallel programming isn't "too hard". Parallel programming that scales well
> > with number of threads when running on hardware with limited memory bandwidth and
> > with SMT threads that actually slow each other down in unpredictable ways by the
> > sharing of core execution resources and cache space ? Harder.
>
> Except when it comes "for free".
> It's not common for developers to put much effort into optimizing
> for SMT, beyond making sure the software works with more threads.
> By far, the most common case for software that benfits from SMT is multi-threaded
> software which has low IPC, despite the developers' best efforts to increase it.
If it has low IPC because it does lots of cache misses and is constrained by DRAM
latency, then SMT *might* help (unless the smaller cache-per-thread reduces the hit
rate even further ...); if it has low IPC because it's constrained by DRAM *bandwidth*,
then SMT won't help at all: 4C/4T can swamp the DRAM system just as well as 4C/8T.
> RichardC (tich.delete@this.pobox.com) on May 15, 2013 7:29 am wrote:
>
> > Parallel programming isn't "too hard". Parallel programming that scales well
> > with number of threads when running on hardware with limited memory bandwidth and
> > with SMT threads that actually slow each other down in unpredictable ways by the
> > sharing of core execution resources and cache space ? Harder.
>
> Except when it comes "for free".
> It's not common for developers to put much effort into optimizing
> for SMT, beyond making sure the software works with more threads.
> By far, the most common case for software that benfits from SMT is multi-threaded
> software which has low IPC, despite the developers' best efforts to increase it.
If it has low IPC because it does lots of cache misses and is constrained by DRAM
latency, then SMT *might* help (unless the smaller cache-per-thread reduces the hit
rate even further ...); if it has low IPC because it's constrained by DRAM *bandwidth*,
then SMT won't help at all: 4C/4T can swamp the DRAM system just as well as 4C/8T.