By: anon (anon.delete@this.anon.com), April 11, 2013 10:19 am
Room: Moderated Discussions
Eric Bron (eric.bron.delete@this.zvisuel.privatefortest.com) on April 11, 2013 8:50 am wrote:
> > What does this mean?
> >
> > Single thread performance is very important quality, and increasingly
>
> fair enough, I'm mostly interested in multi thread peformance, I typically measured speedups
> in some rare occasions with explicit prefetch and a single thread but they all disapear with
> several threads (no slowdown due to the explicit prefetch, though, just a complete waste of
> development time), 8 concurrent threads sharing the LLC is the common case today
>
>
> > Prefetching can help a bit when traversing data structures which are not contiguous
> > in memory or at a predictable stride. Linked lists, trees, etc.
>
> do you have an example code to share ?
Not off hand.
> to make it work (i.e. prefetch several nodes
> ahead) you'll need an auxiliary array of pointers isn't it ? this looks very cumbersome
> to code and will waste bandwidth, particularly with 64-bit pointers
No. It is obviously difficult to do if you are traversing small objects in a very simple manner.
However if you traverse larger and more complex objects, for example, you may have known location of several memory addresses inherently in the object.
object = list.ptr;
prefetch(object->sub_object);
prefetch(&object->far_from_start);
prefetch(object->list.next);
/*
* At this point, you have 3 memops in flight. Even if do_something
* has to wait for one of them, you still get the MLP which probably
* can not be found by hardware prefetchers, and quite possibly will
* not be found so early by the OOOE machine.
*/
do_otherthing(object->sub_object);
do_something(&object->far_from_start);
object = object->list.next;
} while (object);
And now even better, we'll be able to prefetchw, which helps the above pattern quite a lot.
> > What does this mean?
> >
> > Single thread performance is very important quality, and increasingly
>
> fair enough, I'm mostly interested in multi thread peformance, I typically measured speedups
> in some rare occasions with explicit prefetch and a single thread but they all disapear with
> several threads (no slowdown due to the explicit prefetch, though, just a complete waste of
> development time), 8 concurrent threads sharing the LLC is the common case today
>
>
> > Prefetching can help a bit when traversing data structures which are not contiguous
> > in memory or at a predictable stride. Linked lists, trees, etc.
>
> do you have an example code to share ?
Not off hand.
> to make it work (i.e. prefetch several nodes
> ahead) you'll need an auxiliary array of pointers isn't it ? this looks very cumbersome
> to code and will waste bandwidth, particularly with 64-bit pointers
No. It is obviously difficult to do if you are traversing small objects in a very simple manner.
However if you traverse larger and more complex objects, for example, you may have known location of several memory addresses inherently in the object.
object = list.ptr;
prefetch(object->sub_object);
prefetch(&object->far_from_start);
prefetch(object->list.next);
/*
* At this point, you have 3 memops in flight. Even if do_something
* has to wait for one of them, you still get the MLP which probably
* can not be found by hardware prefetchers, and quite possibly will
* not be found so early by the OOOE machine.
*/
do_otherthing(object->sub_object);
do_something(&object->far_from_start);
object = object->list.next;
} while (object);
And now even better, we'll be able to prefetchw, which helps the above pattern quite a lot.