By: GeertB (boschg.delete@this.mac.com), April 4, 2021 7:08 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 3, 2021 12:14 pm wrote:
> sr (nobody.delete@this.nowhere.com) on April 3, 2021 11:30 am wrote:
> >
> > But whole idea of transactional memory isn't saved cycles from locking - main point
> > is to let other threads to use all cachelines that aren't modified by transaction.
>
> No.
>
> The main idea of transactional memory is to improve performance.
>
> Yes, it does so by avoiding bouncing cachelines (and that is mainly by keeping them shared).
>
> But one is fundamental (performance) and one is just a tool to get there (avoid dirtying cachelines).
>
> See?
>
> There is absolutely zero point in avoiding dirtying cachelines in itself. If using an actual honest-to-goodness
> lock (or incrementing and then decrementing a reference count - another example of something that using
> a transaction could possibly avoid) and dirtying the cacheline performs better, then that's by definition
> better than trying to desperately use a (slower) transaction that avoids it.
>
> So the basic and truly fundamental issue is purely about performance. If transactions don't perform better
> than locking (or atomics), you have entirely missed the whole point. No amount "but but at least you avoided
> a dirty cacheline" matters one whit if those dirty cache lines got you better performance.
>
> Which gets us back to my original argument: locking is not necessarily hugely expensive in the common
> case with little contention. And in a not insignificant number of the cases where lock contention is a
> real thing, trying to do the same with transactions will fail due to capacity and/or conflict issues.
>
> If your transaction hardware doesn't handle those cases well, your
> transactional memory hardware is useless garbage and has failed.
>
> Case in point: TSX.
>
> Really. I'm not making some theoretical argument here. I'm making arguments
> based on undeniable facts. TSX has been around. It hasn't performed.
For me, the biggest issue has been that TSX, and especially HLE, have been buggy. The few Debian/SUSE releases that defaulted to using HLE have cost enormous pain in debugging and instrumenting our locking code (that in retrospect was correct), only to find out that we hit "impossible" invariant conditions due to hardware bugs. Figuring out what microcode version some server in China uses, and what undocumented list of bugs it works around, is extremely expensive. So, while I was excited about transactional memory, and still think there are some good use-cases (especially very localized/constrained to "hide" ref counter manipulation, or locking for read-mostly use cases), it's hard to trust this after we went through the "it's here" / "it was buggy, we disabled it" loop a few too many times.
It used to be the case that "if you think it's the compiler, it probably is your code" and especially "if you think it's the processor, it may be the compiler, but it probably is your code", but now if I get customers with dubious locking scenarios, I know it was Intel's fault the last few times. Intel lost their credibility to re-introduce HTM anytime soon. Sad.
-Geert
> sr (nobody.delete@this.nowhere.com) on April 3, 2021 11:30 am wrote:
> >
> > But whole idea of transactional memory isn't saved cycles from locking - main point
> > is to let other threads to use all cachelines that aren't modified by transaction.
>
> No.
>
> The main idea of transactional memory is to improve performance.
>
> Yes, it does so by avoiding bouncing cachelines (and that is mainly by keeping them shared).
>
> But one is fundamental (performance) and one is just a tool to get there (avoid dirtying cachelines).
>
> See?
>
> There is absolutely zero point in avoiding dirtying cachelines in itself. If using an actual honest-to-goodness
> lock (or incrementing and then decrementing a reference count - another example of something that using
> a transaction could possibly avoid) and dirtying the cacheline performs better, then that's by definition
> better than trying to desperately use a (slower) transaction that avoids it.
>
> So the basic and truly fundamental issue is purely about performance. If transactions don't perform better
> than locking (or atomics), you have entirely missed the whole point. No amount "but but at least you avoided
> a dirty cacheline" matters one whit if those dirty cache lines got you better performance.
>
> Which gets us back to my original argument: locking is not necessarily hugely expensive in the common
> case with little contention. And in a not insignificant number of the cases where lock contention is a
> real thing, trying to do the same with transactions will fail due to capacity and/or conflict issues.
>
> If your transaction hardware doesn't handle those cases well, your
> transactional memory hardware is useless garbage and has failed.
>
> Case in point: TSX.
>
> Really. I'm not making some theoretical argument here. I'm making arguments
> based on undeniable facts. TSX has been around. It hasn't performed.
For me, the biggest issue has been that TSX, and especially HLE, have been buggy. The few Debian/SUSE releases that defaulted to using HLE have cost enormous pain in debugging and instrumenting our locking code (that in retrospect was correct), only to find out that we hit "impossible" invariant conditions due to hardware bugs. Figuring out what microcode version some server in China uses, and what undocumented list of bugs it works around, is extremely expensive. So, while I was excited about transactional memory, and still think there are some good use-cases (especially very localized/constrained to "hide" ref counter manipulation, or locking for read-mostly use cases), it's hard to trust this after we went through the "it's here" / "it was buggy, we disabled it" loop a few too many times.
It used to be the case that "if you think it's the compiler, it probably is your code" and especially "if you think it's the processor, it may be the compiler, but it probably is your code", but now if I get customers with dubious locking scenarios, I know it was Intel's fault the last few times. Intel lost their credibility to re-introduce HTM anytime soon. Sad.
-Geert