By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), April 8, 2021 8:41 am
Room: Moderated Discussions
Andrey (andrey.semashev.delete@this.gmail.com) on April 7, 2021 5:54 pm wrote:
>
> Thing is, I don't see the point in defining the limit on the transaction size to begin with. HTM implementation
> has no reason to care about instruction count, the only thing it should care about is memory accesses.
Wrong.
If the transaction hardware is built around the - already existing - rename hardware and out-of-order queues, then the transaction size is limited not only by number of memory accesses, but by everything else too - number of register accesses, number of (possibly split) instructions etc etc.
And if the transaction hardware is not built around that kind of existing machinery, but around some special "checkpoint/restore" mode, then the transaction hardware is slow garbage.
Really.
So that's my argument. I think MTH is a quite reasonable idea, but it needs to be a hell of a lot better than TSX ever was. And the way to make it better is to make it (a) faster and (b) less of a buggy morass. Because it was both slow and buggy.
And my argument is that the way to get there might well be to make it simpler. Not add more complexity, and not make it depend on some checkpointing model that is fundamentally expensive.
Right now TSX is unusable. Not only because it's broken and buggy, but simply because it is too slow.
Doing a (successful) TSX transaction was in my tests slower than just using a lock.
And I will repeat: the default for locking - and the common case - is not contention. So if your transactional hardware is slower than a lock when there is no contention on the lock and the data, then your transactional hardware is a dead end.
Sure, that was only one particular implementation (I only ever had one machine that for a while had TSX enabled), and I didn't do some kind of rigorous big testing. I did enough to see "Oh, this is slower than just doing it by hand, both when uncontended and when contended", and dropped my test-patches for the kernel.
Andrey, I don't understand why you are pushing TSX. You apparently have never used it, you don't seem to care about the fact that it's been buggy and slow. You just try to push a false narrative of "it could work if it was just even bigger and more complex and slower".
Linus
>
> Thing is, I don't see the point in defining the limit on the transaction size to begin with. HTM implementation
> has no reason to care about instruction count, the only thing it should care about is memory accesses.
Wrong.
If the transaction hardware is built around the - already existing - rename hardware and out-of-order queues, then the transaction size is limited not only by number of memory accesses, but by everything else too - number of register accesses, number of (possibly split) instructions etc etc.
And if the transaction hardware is not built around that kind of existing machinery, but around some special "checkpoint/restore" mode, then the transaction hardware is slow garbage.
Really.
So that's my argument. I think MTH is a quite reasonable idea, but it needs to be a hell of a lot better than TSX ever was. And the way to make it better is to make it (a) faster and (b) less of a buggy morass. Because it was both slow and buggy.
And my argument is that the way to get there might well be to make it simpler. Not add more complexity, and not make it depend on some checkpointing model that is fundamentally expensive.
Right now TSX is unusable. Not only because it's broken and buggy, but simply because it is too slow.
Doing a (successful) TSX transaction was in my tests slower than just using a lock.
And I will repeat: the default for locking - and the common case - is not contention. So if your transactional hardware is slower than a lock when there is no contention on the lock and the data, then your transactional hardware is a dead end.
Sure, that was only one particular implementation (I only ever had one machine that for a while had TSX enabled), and I didn't do some kind of rigorous big testing. I did enough to see "Oh, this is slower than just doing it by hand, both when uncontended and when contended", and dropped my test-patches for the kernel.
Andrey, I don't understand why you are pushing TSX. You apparently have never used it, you don't seem to care about the fact that it's been buggy and slow. You just try to push a false narrative of "it could work if it was just even bigger and more complex and slower".
Linus