By: Patrick Chase (patrickjchase.delete@this.gmai.com), July 7, 2015 11:14 pm
Room: Moderated Discussions
someotherdude (someotherdude.delete@this.none.none.none.none) on July 7, 2015 4:01 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 7, 2015 7:43 am wrote:
> > Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on July 7, 2015 2:50 am wrote:
> > > SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > > > Maybe that kind of hardware optimization is there for old non-recompiled code?
> > >
> > > It also works for stores which you cannot handle with isel.
> >
> > To "predicate" a store, one selects the address (either the ordinary address or a
> > safe but unused address ) and performs the store. A store is more expensive than just
> > an ALU operation, but it could still be cheaper than a branch misprediction.
>
> I see that you put predicate in quotes, but I think you underweight the cost of doing an
> actual store. You're discounting the possibilities and costs of a TLB miss, a cache miss,
> a page miss, etc. every one of which could be much greater in cost than the branch
> mispredict, perhaps even by magnitudes.
And you have obviously never done serious optimization, or else you'd know that there are trivial fixes for the issues you raise.
The programmer controls the scratch address that's selected when the store is "disabled". As such they can determine both page and cache line, so if they're halfway competent they can easily mitigate all of your stated concerns.
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 7, 2015 7:43 am wrote:
> > Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on July 7, 2015 2:50 am wrote:
> > > SHK (no.delete@this.mail.com) on July 6, 2015 12:41 pm wrote:
> > > > Maybe that kind of hardware optimization is there for old non-recompiled code?
> > >
> > > It also works for stores which you cannot handle with isel.
> >
> > To "predicate" a store, one selects the address (either the ordinary address or a
> > safe but unused address ) and performs the store. A store is more expensive than just
> > an ALU operation, but it could still be cheaper than a branch misprediction.
>
> I see that you put predicate in quotes, but I think you underweight the cost of doing an
> actual store. You're discounting the possibilities and costs of a TLB miss, a cache miss,
> a page miss, etc. every one of which could be much greater in cost than the branch
> mispredict, perhaps even by magnitudes.
And you have obviously never done serious optimization, or else you'd know that there are trivial fixes for the issues you raise.
The programmer controls the scratch address that's selected when the store is "disabled". As such they can determine both page and cache line, so if they're halfway competent they can easily mitigate all of your stated concerns.