By: Brendan (btrotter.delete@this.gmail.com), April 12, 2013 3:34 pm
Room: Moderated Discussions
Hi,
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 12, 2013 12:15 pm wrote:
> > If it helps; assume that I'm iterating through a large linked list of "2 cache lines large" structures:
> > cache miss accessing the structure's first cache line, cache miss accessing the structure's second
> > cache line, then the hardware prefetcher notices and pointlessly prefetches a third cache line that
> > won't be accessed; and this is repeated for every structure in the linked list.
>
> well, this scenario should never happens on Intel's CPU since the Pentium
> 4 thanks to the spacial prefetcher (adjacent line prefetch)
How possible it is depends on alignment within the 128-byte sector.
> more generally you seem to think it's possible to prefetch a linked list, it's generally
> not practical since you must prefetch several nodes in advance to hide latency
It seems easy to me - just have a "next" field pointing to the next structure in the list, plus an additional "next_prefetch" field pointing to the structure several nodes ahead.
> > Also note that if LLC cache hit rate is high you may still get benefits from
> > prefetching from LLC into L1/L2. For example, for Sandy Bridge an LLC hit has
>
> sure, that's why the streaming prefetcher & IP-based stride prefetcher already prefetch to the L1D, and (depending
> on the load) the adjacent line prefetcher, the streamer and the next-page prefetcher prefetch to L2
Sure; but all of that still only works for cases where it works (e.g. ascending/descending) and doesn't work for cases where it doesn't work (e.g. pointer chasing).
> btw how do you explicitely prefetch to the L1D on Sandy Brige ? prefetchnta ? other ?
I would've assumed "prefetcht0". Sadly, Intel's instruction set reference only gives details for Pentium III and Pentium 4/Xeon.
- Brendan
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 12, 2013 12:15 pm wrote:
> > If it helps; assume that I'm iterating through a large linked list of "2 cache lines large" structures:
> > cache miss accessing the structure's first cache line, cache miss accessing the structure's second
> > cache line, then the hardware prefetcher notices and pointlessly prefetches a third cache line that
> > won't be accessed; and this is repeated for every structure in the linked list.
>
> well, this scenario should never happens on Intel's CPU since the Pentium
> 4 thanks to the spacial prefetcher (adjacent line prefetch)
How possible it is depends on alignment within the 128-byte sector.
> more generally you seem to think it's possible to prefetch a linked list, it's generally
> not practical since you must prefetch several nodes in advance to hide latency
It seems easy to me - just have a "next" field pointing to the next structure in the list, plus an additional "next_prefetch" field pointing to the structure several nodes ahead.
> > Also note that if LLC cache hit rate is high you may still get benefits from
> > prefetching from LLC into L1/L2. For example, for Sandy Bridge an LLC hit has
>
> sure, that's why the streaming prefetcher & IP-based stride prefetcher already prefetch to the L1D, and (depending
> on the load) the adjacent line prefetcher, the streamer and the next-page prefetcher prefetch to L2
Sure; but all of that still only works for cases where it works (e.g. ascending/descending) and doesn't work for cases where it doesn't work (e.g. pointer chasing).
> btw how do you explicitely prefetch to the L1D on Sandy Brige ? prefetchnta ? other ?
I would've assumed "prefetcht0". Sadly, Intel's instruction set reference only gives details for Pentium III and Pentium 4/Xeon.
- Brendan