By: Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com), April 11, 2013 10:08 am
Room: Moderated Discussions
> How would prefetching 1600 bytes be harmful to a 16KByte
> L1 or 1MB L2?
the amount of cache polution and wasted bandwidth depend on the size of your arrays, for example if you have small 3200 bytes arrays that you prefetch 1600 bytes ahead, you'll contuniously fetch 50% more data than actually required, this is an exagerated example but it should show why there is diminishing returns with more threads and increased prefetch scheduling distance
also you typically have more than one input stream to fetch from *per thread* + the store streams are also trashing your L1D cache
> L1 or 1MB L2?
the amount of cache polution and wasted bandwidth depend on the size of your arrays, for example if you have small 3200 bytes arrays that you prefetch 1600 bytes ahead, you'll contuniously fetch 50% more data than actually required, this is an exagerated example but it should show why there is diminishing returns with more threads and increased prefetch scheduling distance
also you typically have more than one input stream to fetch from *per thread* + the store streams are also trashing your L1D cache