By: Brendan (btrotter.delete@this.gmail.com), April 11, 2013 11:26 am
Room: Moderated Discussions
Hi,
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 8:57 am wrote:
> > A logical CPU can only run one software thread at a time. Switching between threads will upset
>
> obviously I'm not talking about the cases where you swicth threads but the cases where we have 2 threads
> sharing the L1D and L2 caches and 8 threads sharing the LLC as is common with today's mainstream CPUs
I can't see how SMT makes any difference; beyond generic cache size optimisations and/or prefetching too early (which both end up being "generic problem with optimisation" below).
> > This is a generic problem with optimisation - if you don't know much about the target,
> > then you can't do compile-time optimisations to suit that specific target. It doesn't
>
> I mentioned a training phase for computing the prefetch scheduling distance, this is more sensible than freezing
> it at compile time, the fundamental problem is that the ideal distance grow too far to get any speedup
If you're suggesting that your solution (run-time benchmarking) has problems; then I agree. That's why I suggested fixing the real problem (instead of merely fixing one of the symptoms) to begin with.
Obviously if you're optimising for a specific target where the software prefetching can't help (e.g. because the ideal distance is too far to get any speedup) you'd suppress any prefetching.
Of course I'm not saying that fixing the real problem would be easy (evolutionary change vs. revolutionary change).
- Brendan
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 8:57 am wrote:
> > A logical CPU can only run one software thread at a time. Switching between threads will upset
>
> obviously I'm not talking about the cases where you swicth threads but the cases where we have 2 threads
> sharing the L1D and L2 caches and 8 threads sharing the LLC as is common with today's mainstream CPUs
I can't see how SMT makes any difference; beyond generic cache size optimisations and/or prefetching too early (which both end up being "generic problem with optimisation" below).
> > This is a generic problem with optimisation - if you don't know much about the target,
> > then you can't do compile-time optimisations to suit that specific target. It doesn't
>
> I mentioned a training phase for computing the prefetch scheduling distance, this is more sensible than freezing
> it at compile time, the fundamental problem is that the ideal distance grow too far to get any speedup
If you're suggesting that your solution (run-time benchmarking) has problems; then I agree. That's why I suggested fixing the real problem (instead of merely fixing one of the symptoms) to begin with.
Obviously if you're optimising for a specific target where the software prefetching can't help (e.g. because the ideal distance is too far to get any speedup) you'd suppress any prefetching.
Of course I'm not saying that fixing the real problem would be easy (evolutionary change vs. revolutionary change).
- Brendan