By: anon (anon.delete@this.anon.com), October 1, 2015 10:45 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on October 1, 2015 5:16 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 1:52 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on September 30, 2015 10:44 pm wrote:
> > > SHK (no.delete@this.mail.com) on September 29, 2015 6:38 am wrote:
> > > > * page split load penalities from 100 cycles to 5 (that's an improvement!)
> > >
> > > This appears to be a side-effect of having a second HW page-table walker - In a page-
> > > split-load scenario the core should now handle both walks in // instead of sequentially.
> > >
> > > The true benefits are much broader than that (relatively uncommon) case, though.
> >
> > I read this as: a page crossing load which hits in the TLB for both pages now takes
> > 5 cycles instead of 100. That is, I don't think the page table walker is involved.
> >
> > Obviously, if you have a TLB miss - nothing will take anywhere close to 5 cycles.
>
> I read that section as saying that the incremental cost (i.e. the penalty) for a page-split
> load above and beyond the cost of one that doesn't split was reduced from 100 cycles to 5,
> in which case the difference (95 cycles) is credible for 2 walks in // vs 2 sequentially.
Considering that split cacheline load is 5 cycles (on Haswell), I don't think it's credible that they could do a split load, miss TLB, then kick off the second tlb miss handler, complete them, then redo the split load from L1 (assuming you have an L1 hit) all in the same number of cycles as a split cacheline. Seems more likely they just added some improved mechanism to atomically look up 2 TLB entries without some big exception, so the load can proceed from there as any other split cacheline load.
> David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 1:52 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on September 30, 2015 10:44 pm wrote:
> > > SHK (no.delete@this.mail.com) on September 29, 2015 6:38 am wrote:
> > > > * page split load penalities from 100 cycles to 5 (that's an improvement!)
> > >
> > > This appears to be a side-effect of having a second HW page-table walker - In a page-
> > > split-load scenario the core should now handle both walks in // instead of sequentially.
> > >
> > > The true benefits are much broader than that (relatively uncommon) case, though.
> >
> > I read this as: a page crossing load which hits in the TLB for both pages now takes
> > 5 cycles instead of 100. That is, I don't think the page table walker is involved.
> >
> > Obviously, if you have a TLB miss - nothing will take anywhere close to 5 cycles.
>
> I read that section as saying that the incremental cost (i.e. the penalty) for a page-split
> load above and beyond the cost of one that doesn't split was reduced from 100 cycles to 5,
> in which case the difference (95 cycles) is credible for 2 walks in // vs 2 sequentially.
Considering that split cacheline load is 5 cycles (on Haswell), I don't think it's credible that they could do a split load, miss TLB, then kick off the second tlb miss handler, complete them, then redo the split load from L1 (assuming you have an L1 hit) all in the same number of cycles as a split cacheline. Seems more likely they just added some improved mechanism to atomically look up 2 TLB entries without some big exception, so the load can proceed from there as any other split cacheline load.