By: bakaneko (nyan.delete@this.hyan.wan), April 11, 2013 9:39 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on April 11, 2013 8:37 am wrote:
> anon (anon.delete@this.anon.com) on April 11, 2013 8:25 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on April 11, 2013 7:27 am wrote:
> > > anon (anon.delete@this.anon.com) on April 11, 2013 7:14 am wrote:
> > > > Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 3:58 am wrote:
> > > > > > It is possible that future advances in CPU architectures and (more likely) compilers will surprise
> > > > > > you in a different way. Your sentence "I wouldn't be surprised if all the wins from software
> > > > > > prefetching go away, while the downsides remain" may turn out to be shortsighted.
> > > > >
> > > > > IMO there is some fundamental reasons against explicit software prefetch (besides the fact
> > > > > that hw prefetchers render it redundant in most situations), from the top of my head:
> > > > >
> > > > > 1) It is basically a single thread thing
> > > >
> > > > What does this mean?
> > >
> > > It means that software prefetch that helps isolated, when it runs in isolation, is likely to hurt the
> > > same thread running the same code in presence of other threads running on the same CPU. And vice versa
> >
> > Oh, OK well I disagree with that too. Unless "basically" has a special meaning.
> >
> > *Some* cases of prefetching, e.g., ones which blow cache size or speculate
> > and waste memory bandwidth, can help for single thread but be damaging for
> > multiple. But that does not apply to all possible prefetching usages.
> >
>
> When # cores is greater than 2 and # threads per core is greater than 1 and there are
> shared levels in on-chip cache hierarchy, methinks, that universally useful (or even
> just non-harmful) software prefetching is far less common than other way around :(
How would prefetching 1600 bytes be harmful to a 16KByte
L1 or 1MB L2?
And prefetching aside, caches need to do load balancing
between hardware threads anyway. If prefetching damages
this, then load balancing is at fault, not the software
giving the hint.
Large prefetches are probably very damaging to the
thread doing the prefetch, but as said above - the CPU
has to do load balancing between all cores (Or a tight
loop touching all memory has the same bad effect).
> anon (anon.delete@this.anon.com) on April 11, 2013 8:25 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on April 11, 2013 7:27 am wrote:
> > > anon (anon.delete@this.anon.com) on April 11, 2013 7:14 am wrote:
> > > > Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 3:58 am wrote:
> > > > > > It is possible that future advances in CPU architectures and (more likely) compilers will surprise
> > > > > > you in a different way. Your sentence "I wouldn't be surprised if all the wins from software
> > > > > > prefetching go away, while the downsides remain" may turn out to be shortsighted.
> > > > >
> > > > > IMO there is some fundamental reasons against explicit software prefetch (besides the fact
> > > > > that hw prefetchers render it redundant in most situations), from the top of my head:
> > > > >
> > > > > 1) It is basically a single thread thing
> > > >
> > > > What does this mean?
> > >
> > > It means that software prefetch that helps isolated, when it runs in isolation, is likely to hurt the
> > > same thread running the same code in presence of other threads running on the same CPU. And vice versa
> >
> > Oh, OK well I disagree with that too. Unless "basically" has a special meaning.
> >
> > *Some* cases of prefetching, e.g., ones which blow cache size or speculate
> > and waste memory bandwidth, can help for single thread but be damaging for
> > multiple. But that does not apply to all possible prefetching usages.
> >
>
> When # cores is greater than 2 and # threads per core is greater than 1 and there are
> shared levels in on-chip cache hierarchy, methinks, that universally useful (or even
> just non-harmful) software prefetching is far less common than other way around :(
How would prefetching 1600 bytes be harmful to a 16KByte
L1 or 1MB L2?
And prefetching aside, caches need to do load balancing
between hardware threads anyway. If prefetching damages
this, then load balancing is at fault, not the software
giving the hint.
Large prefetches are probably very damaging to the
thread doing the prefetch, but as said above - the CPU
has to do load balancing between all cores (Or a tight
loop touching all memory has the same bad effect).