By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), April 3, 2021 11:11 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 3, 2021 10:41 am wrote:
>
> Look here: if you know your abort cost, and you have some rough idea of how much you expect to win from doing
> the operation as a transaction, you should now have a rough idea how hard you should avoid aborts.
Put another way: just a fairly simple rule of thumb might be "for a transaction to make sense, it has to succeed 99% of the time, because the abort costs are generally 100x the win".
And the actual failure predictor doesn't have to really count to 100 to get that. The actual failure predictor can be just a couple of bits - exactly like a branch predictor, and make it be exactly that same kind of "weak predict" (failed once) and "strong predict" (failed twice, or failed with a capacity error") logic.
And once you have a "strongly predict not to do this transaction", you basically treat it as a branch prediction to go to the fallback case, and you can do the aging with just a global counter. When the counter hits some predetermined value, you clear the prediction data (for that predictor), and reset the counter.
The above is obviously just a suggested starting point. As mentioned, you have to do a lot of actual simulation of real loads to get there, but you may be able to do a lot of it with real hardware (implement the prediction in microcode).
The above is the kind of thing that hardware can do. But software cannot. Because doing it in software will eat into that very precious - and not very big - "this is how many cycles you can win by doing it as a transaction".
Because please realize that the cost of locking is not very big at all in the common case. The cost of an uncontended lock is literally on the order of "ten cycles", and most of those ten cycles are because of memory ordering constraints, not "instruction costs".
And so that "on the order of ten cycles" is what you have to at least match with a transaction for it to really make sense in general, because the contended case is not the common case.
If you have to do the prediction in software, you already ate up all the wins.
So your transaction cost should match the cost of that uncontended lock at the good end ("because common case"), and then scale better than a contended lock at the bad end ("because this is why we do transactions").
And that is fundamental. Because otherwise, why would you ever want that transaction in the first place?
Linus
>
> Look here: if you know your abort cost, and you have some rough idea of how much you expect to win from doing
> the operation as a transaction, you should now have a rough idea how hard you should avoid aborts.
Put another way: just a fairly simple rule of thumb might be "for a transaction to make sense, it has to succeed 99% of the time, because the abort costs are generally 100x the win".
And the actual failure predictor doesn't have to really count to 100 to get that. The actual failure predictor can be just a couple of bits - exactly like a branch predictor, and make it be exactly that same kind of "weak predict" (failed once) and "strong predict" (failed twice, or failed with a capacity error") logic.
And once you have a "strongly predict not to do this transaction", you basically treat it as a branch prediction to go to the fallback case, and you can do the aging with just a global counter. When the counter hits some predetermined value, you clear the prediction data (for that predictor), and reset the counter.
The above is obviously just a suggested starting point. As mentioned, you have to do a lot of actual simulation of real loads to get there, but you may be able to do a lot of it with real hardware (implement the prediction in microcode).
The above is the kind of thing that hardware can do. But software cannot. Because doing it in software will eat into that very precious - and not very big - "this is how many cycles you can win by doing it as a transaction".
Because please realize that the cost of locking is not very big at all in the common case. The cost of an uncontended lock is literally on the order of "ten cycles", and most of those ten cycles are because of memory ordering constraints, not "instruction costs".
And so that "on the order of ten cycles" is what you have to at least match with a transaction for it to really make sense in general, because the contended case is not the common case.
If you have to do the prediction in software, you already ate up all the wins.
So your transaction cost should match the cost of that uncontended lock at the good end ("because common case"), and then scale better than a contended lock at the bad end ("because this is why we do transactions").
And that is fundamental. Because otherwise, why would you ever want that transaction in the first place?
Linus