By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), March 31, 2021 11:50 am
Room: Moderated Discussions
Andrey (andrey.semashev.delete@this.gmail.com) on March 31, 2021 5:27 am wrote:
>
> You obviously have to write non-transactional path, and it will have its pitfalls, but the point
> is that you could have better best-case and average performance with TSX.
No, you really really don't.
TSX was slow even when it worked and didn't have aborts, and never gave you "best-case" performance at all due to that. Simple non-contended non-TSX locks worked better.
And TSX was a complete disaster when you had any data contention, and just caused overhead and aborts and fallbacks to locked code, so - no surprise - plain non-TSX locks worked better. And data contention is quite common, and happened for a lot of trivial reasons (statistics being one).
And no, TSX didn't have better average performance either, because in order to avoid the problems, you had to do statistics in software, which added its own set of overhead.
As far as I know, there were approximately zero real-world loads that were better with TSX than without.
The only case that TSX ever did ok on was when there was zero data contention at all, and lots of cache coherence costs due almost entirely due to locking, and then TSX can keep the lock as a shared cache line. Yes, this really can happen, but most of the time it happens is when you also have big enough locked regions that they don't get caught by the transactional memory due to size overflows.
And making the transaction size larger makes the costs higher too, so now you need to do a much better job at predicting ahead of time whether transactions will succeed or not. Which Intel entirely screwed up, and I blame them completely. I told them at the first meeting they had (before TSX was public) that they need to add a TSX predictor, and they never did.
And the problems with TSX were legion, including dat aleaks and actual outright memory ordering bugs.
TSX was garbage, and remains so.
This is not to say that you couldn't get transactional memory right, but as it stands right now, I do not believe that anybody has ever had an actual successful and useful implementation of transactional memory.
And I can pretty much guarantee that to do it right you need to have a transaction success predictor (like a branch predictor) so that software doesn't have to deal with yet another issue of "on this uarch, and this load, the transaction size is too small to fit this lock".
I'm surprised that ARM made it part of v9 (and surprised that ARM kept the 32-bit compatibility part - I really thought they wanted to get rid of it).
Linus
>
> You obviously have to write non-transactional path, and it will have its pitfalls, but the point
> is that you could have better best-case and average performance with TSX.
No, you really really don't.
TSX was slow even when it worked and didn't have aborts, and never gave you "best-case" performance at all due to that. Simple non-contended non-TSX locks worked better.
And TSX was a complete disaster when you had any data contention, and just caused overhead and aborts and fallbacks to locked code, so - no surprise - plain non-TSX locks worked better. And data contention is quite common, and happened for a lot of trivial reasons (statistics being one).
And no, TSX didn't have better average performance either, because in order to avoid the problems, you had to do statistics in software, which added its own set of overhead.
As far as I know, there were approximately zero real-world loads that were better with TSX than without.
The only case that TSX ever did ok on was when there was zero data contention at all, and lots of cache coherence costs due almost entirely due to locking, and then TSX can keep the lock as a shared cache line. Yes, this really can happen, but most of the time it happens is when you also have big enough locked regions that they don't get caught by the transactional memory due to size overflows.
And making the transaction size larger makes the costs higher too, so now you need to do a much better job at predicting ahead of time whether transactions will succeed or not. Which Intel entirely screwed up, and I blame them completely. I told them at the first meeting they had (before TSX was public) that they need to add a TSX predictor, and they never did.
And the problems with TSX were legion, including dat aleaks and actual outright memory ordering bugs.
TSX was garbage, and remains so.
This is not to say that you couldn't get transactional memory right, but as it stands right now, I do not believe that anybody has ever had an actual successful and useful implementation of transactional memory.
And I can pretty much guarantee that to do it right you need to have a transaction success predictor (like a branch predictor) so that software doesn't have to deal with yet another issue of "on this uarch, and this load, the transaction size is too small to fit this lock".
I'm surprised that ARM made it part of v9 (and surprised that ARM kept the 32-bit compatibility part - I really thought they wanted to get rid of it).
Linus