By: David Kanter (dkanter.delete@this.realworldtech.com), April 11, 2013 8:50 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on April 11, 2013 5:00 pm wrote:
> Hi,
>
> Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 11:36 am wrote:
> > > I can't see how SMT makes any difference; beyond generic cache size optimisations and/or
> >
> > SMT is very powerful to hide latencies, particularly LLC cache misses,
> > thus explicit prefetch is far less effective with 2 running threads
>
> Reduced benefits isn't the same as making it disadvantageous - prefetching would
> still get performance gains (just not as much, e.g. depending on how often both
> threads happen to be stalled waiting for memory fetch at the same time).
>
> > > because the ideal distance is too far to get any speedup) you'd suppress any prefetching.
> >
> > this was my point, over the years the ideal distance has grown by 10x or more
>
> You understand that if the prefetch distance grows by 10 times or more;
> then the penalty of not prefetching also grows by 10 times or more?
>
> - Brendan
>
That's not necessarily true. If you aren't prefetching at the right time, you are evicting things from the cache that might be useful.
Dean did some benchmarking a decade ago that showed that having HW and SW prefetching together could often decrease performance because of the conflicts between the two.
It's a rather complex area and I tend to believe that JIT driven prefetching might be effective, but it's doubtful for static code.
David
> Hi,
>
> Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 11:36 am wrote:
> > > I can't see how SMT makes any difference; beyond generic cache size optimisations and/or
> >
> > SMT is very powerful to hide latencies, particularly LLC cache misses,
> > thus explicit prefetch is far less effective with 2 running threads
>
> Reduced benefits isn't the same as making it disadvantageous - prefetching would
> still get performance gains (just not as much, e.g. depending on how often both
> threads happen to be stalled waiting for memory fetch at the same time).
>
> > > because the ideal distance is too far to get any speedup) you'd suppress any prefetching.
> >
> > this was my point, over the years the ideal distance has grown by 10x or more
>
> You understand that if the prefetch distance grows by 10 times or more;
> then the penalty of not prefetching also grows by 10 times or more?
>
> - Brendan
>
That's not necessarily true. If you aren't prefetching at the right time, you are evicting things from the cache that might be useful.
Dean did some benchmarking a decade ago that showed that having HW and SW prefetching together could often decrease performance because of the conflicts between the two.
It's a rather complex area and I tend to believe that JIT driven prefetching might be effective, but it's doubtful for static code.
David