By: sr (nobody.delete@this.nowhere.com), April 3, 2021 11:30 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 3, 2021 11:11 am wrote:
> The above is the kind of thing that hardware can do. But software cannot.
> Because doing it in software will eat into that very precious - and not very
> big - "this is how many cycles you can win by doing it as a transaction".
But whole idea of transactional memory isn't saved cycles from locking - main point is to let other threads to use all cachelines that aren't modified by transaction.
>
> Because please realize that the cost of locking is not very big at all in the common case.
> The cost of an uncontended lock is literally on the order of "ten cycles", and most of those
> ten cycles are because of memory ordering constraints, not "instruction costs".
>
> And so that "on the order of ten cycles" is what you have to at least match with a transaction
> for it to really make sense in general, because the contended case is not the common case.
>
> If you have to do the prediction in software, you already ate up all the wins.
Transactional memory is hardware-locking for modified cachelines. So other threads can keep using memory that isn't modified in transaction. How much time and effort is used to do fine-grained locking or lockless algorithms where using transactional memory could work just as fine?
>
> So your transaction cost should match the cost of that uncontended lock at the good end ("because common case"),
> and then scale better than a contended lock at the bad end ("because this is why we do transactions").
>
> And that is fundamental. Because otherwise, why would you ever want that transaction in the first place?
Because locking is hard - and letting hardware doing it instead save a lot of time and effort.
> The above is the kind of thing that hardware can do. But software cannot.
> Because doing it in software will eat into that very precious - and not very
> big - "this is how many cycles you can win by doing it as a transaction".
But whole idea of transactional memory isn't saved cycles from locking - main point is to let other threads to use all cachelines that aren't modified by transaction.
>
> Because please realize that the cost of locking is not very big at all in the common case.
> The cost of an uncontended lock is literally on the order of "ten cycles", and most of those
> ten cycles are because of memory ordering constraints, not "instruction costs".
>
> And so that "on the order of ten cycles" is what you have to at least match with a transaction
> for it to really make sense in general, because the contended case is not the common case.
>
> If you have to do the prediction in software, you already ate up all the wins.
Transactional memory is hardware-locking for modified cachelines. So other threads can keep using memory that isn't modified in transaction. How much time and effort is used to do fine-grained locking or lockless algorithms where using transactional memory could work just as fine?
>
> So your transaction cost should match the cost of that uncontended lock at the good end ("because common case"),
> and then scale better than a contended lock at the bad end ("because this is why we do transactions").
>
> And that is fundamental. Because otherwise, why would you ever want that transaction in the first place?
Because locking is hard - and letting hardware doing it instead save a lot of time and effort.