By: anon (spam.delete.delete@this.this.spam.com), September 18, 2018 11:53 am
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on September 18, 2018 10:58 am wrote:
> anon (spam.delete.delete@this.this.spam.com) on September 18, 2018 2:43 am wrote:
>
> > Can it do 2 fast path loads in the same cycle? If not it would make sense to prioritize pointer chases.
>
> Yes, it can - at least on SKL and IVB (the two archs I tested on).
>
> I add an 8-way pointer chasing test to uarch-bench which does independent 8 pointer chases in parallel,
> and this still takes only 4 cycles to do one iteration (which is composed of one pointer chase for
> each chain), i.e., a throughput of 2 loads per cycle with each load taking only 4 cycles.
If throughput isn't the problem and it only happens when the loads immediately follow each other then it might be something different. Maybe it's skipping the TLB lookup altogether.
> anon (spam.delete.delete@this.this.spam.com) on September 18, 2018 2:43 am wrote:
>
> > Can it do 2 fast path loads in the same cycle? If not it would make sense to prioritize pointer chases.
>
> Yes, it can - at least on SKL and IVB (the two archs I tested on).
>
> I add an 8-way pointer chasing test to uarch-bench which does independent 8 pointer chases in parallel,
> and this still takes only 4 cycles to do one iteration (which is composed of one pointer chase for
> each chain), i.e., a throughput of 2 loads per cycle with each load taking only 4 cycles.
If throughput isn't the problem and it only happens when the loads immediately follow each other then it might be something different. Maybe it's skipping the TLB lookup altogether.