By: David Kanter (dkanter.delete@this.realworldtech.com), March 31, 2021 8:41 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on March 31, 2021 5:08 pm wrote:
> anon2 (anon.delete@this.anon.com) on March 31, 2021 3:46 pm wrote:
> > >
> > > For example, I see no sign that the ARM 'tstart' instruction has a success predictor
> > > behind it. And once again - without a hardware predictor, you can make up benchmarks
> > > that show how well it works, but real life will bite you in the arse.
> >
> > Wouldn't that be purely microarchitectural? What kind of sign would you expect to see
> > if they intended to implement such a thing (which I agree seems like a good idea).
>
> I agree that it could be seen as purely a microarchitectural detail, and not visible to users.
>
> However, even in that case, I'd expect there to be signs of it in the architecture definition.
> For example, the 'tstart' instruction should look a lot more like a branch, so that the predictor
> logic could act on it exactly that way, and just go to the fallback case.
>
> Another sign that ARM is not designing it with a transaction predictor in mind
> is that the result register doesn't have a "predicted not successful" case.
>
> That said, both could be added later: the first by simply just treating the 'tstart/cbnz'
> sequence as one fused instruction, and the second by adding a new error code. But since
> it's not there architecturally in the initial version, I'd expect that software then
> has to do the prediction for it, and then you're kind of stuck with that garbage.
>
> In fact, looking at the definition of 'tstart', I see all the same old
> signs that "yup, software is supposed to guess whether to try again".
>
> And there is zero question that you absolutely need prediction. Particularly with big transactions, you simply
> cannot afford to do a lot of work, only to then cause a failure just because of transaction size (and we know
> that some transactions will be fundamentally too large, if you try to use 'tstart' for locking).
>
> If the hardware doesn't do it, then the software has to do it, and that involves having software
> try to keep track of "this lock taker in this context has failed before due to transaction size
> issues, so let's not do the HW TM now because we know it's likely going to fail again".
>
> That kind of thing is expensive to do in software. You need to have counters for
> the the success/failure cases, and you need to somehow associate those counters
> with a particular code flow. Exactly like branch prediction hardware does.
>
> Honestly, anybody who tells me that software could do branch prediction is somebody who I
> wouldn't let near a new architecture. So why the h*ll do people think that software should
> do transaction success prediction? It's the exact same thing, with the exact same issues.
>
> Go look at the ARM papers, and tell me that there is any sign that they actually thought
> about this all. Because I don't see it. I see them barreling down the exact same mistakes
> that we've already seen with x86 and ppc, both of which have been abject failures.
>
> Anybody remember what the definition of insanity is, again?
As someone who spent a ton of time working on HTM and speculative multi-threading, I'd like to echo Linus' view that handling prediction and transaction scope is a real problem.
The system we designed at Strandera used dynamically sized transactions and did a lot of analysis to scope correctly. Without that, your life will get very unpleasant in a hurry.
One of the key problems with early implementations (e.g., Sun's Rock) was that too many things could cause aborts. You really need to ensure that TX abort is really rare.
David
> anon2 (anon.delete@this.anon.com) on March 31, 2021 3:46 pm wrote:
> > >
> > > For example, I see no sign that the ARM 'tstart' instruction has a success predictor
> > > behind it. And once again - without a hardware predictor, you can make up benchmarks
> > > that show how well it works, but real life will bite you in the arse.
> >
> > Wouldn't that be purely microarchitectural? What kind of sign would you expect to see
> > if they intended to implement such a thing (which I agree seems like a good idea).
>
> I agree that it could be seen as purely a microarchitectural detail, and not visible to users.
>
> However, even in that case, I'd expect there to be signs of it in the architecture definition.
> For example, the 'tstart' instruction should look a lot more like a branch, so that the predictor
> logic could act on it exactly that way, and just go to the fallback case.
>
> Another sign that ARM is not designing it with a transaction predictor in mind
> is that the result register doesn't have a "predicted not successful" case.
>
> That said, both could be added later: the first by simply just treating the 'tstart/cbnz'
> sequence as one fused instruction, and the second by adding a new error code. But since
> it's not there architecturally in the initial version, I'd expect that software then
> has to do the prediction for it, and then you're kind of stuck with that garbage.
>
> In fact, looking at the definition of 'tstart', I see all the same old
> signs that "yup, software is supposed to guess whether to try again".
>
> And there is zero question that you absolutely need prediction. Particularly with big transactions, you simply
> cannot afford to do a lot of work, only to then cause a failure just because of transaction size (and we know
> that some transactions will be fundamentally too large, if you try to use 'tstart' for locking).
>
> If the hardware doesn't do it, then the software has to do it, and that involves having software
> try to keep track of "this lock taker in this context has failed before due to transaction size
> issues, so let's not do the HW TM now because we know it's likely going to fail again".
>
> That kind of thing is expensive to do in software. You need to have counters for
> the the success/failure cases, and you need to somehow associate those counters
> with a particular code flow. Exactly like branch prediction hardware does.
>
> Honestly, anybody who tells me that software could do branch prediction is somebody who I
> wouldn't let near a new architecture. So why the h*ll do people think that software should
> do transaction success prediction? It's the exact same thing, with the exact same issues.
>
> Go look at the ARM papers, and tell me that there is any sign that they actually thought
> about this all. Because I don't see it. I see them barreling down the exact same mistakes
> that we've already seen with x86 and ppc, both of which have been abject failures.
>
> Anybody remember what the definition of insanity is, again?
As someone who spent a ton of time working on HTM and speculative multi-threading, I'd like to echo Linus' view that handling prediction and transaction scope is a real problem.
The system we designed at Strandera used dynamically sized transactions and did a lot of analysis to scope correctly. Without that, your life will get very unpleasant in a hurry.
One of the key problems with early implementations (e.g., Sun's Rock) was that too many things could cause aborts. You really need to ensure that TX abort is really rare.
David