Hardware Transactional Memory, the end?

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), August 21, 2022 6:58 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on August 21, 2022 1:26 pm wrote:
> What you lose is the SW ease of HTM,

"SW ease of HTM" never existed.

In fact, I would argue that one of the big failures of HTM was to aim for it being some kind of generic thing in the first place. That just made it big, intrusive, and fragile.

Large transactions are a mistake. They are fundamentally fragile. It's like designing for a glass jaw: "this will go really well if all the stars align, but if they don't it's going to be expensive as heck".

And the bigger the transaction, the bigger the cost of failure, but also the likelihood of failure goes up. A lot. And it goes up the most in the exact situation where you do not want it to go up - when there is contention.

So I'd much rather have something small, simple, and predictably high-performance.

IOW, I'd rather get something like perhaps just an extended load-linked / store-conditional model. Very much with the hardware guaranteed forward progress that LL/SC has. Just extended to a couple more accesses. Take LL/SC and make it LLx/SCx - and keep the forward progress guarantees. In hardware.

And you can only give those forward progress guarantees if you end up just having code length limits, and seriously limit the number of accesses. And keep it simple enough that hardware people (and software people) can actually believe that you get it right.

For example, forward progress guarantees almost certainly mean that you'd probably have to do something like order the accesses by physical address when you repeat, so that you can then actually force them to stay in cache until completion without getting into nasty deadlocks.

Don't order things on the first try, probably not on the second, but if you've seen several failed sequences in a row with no successes, keep track of the addresses that are used, and lock them in cache (within those very strict limits you set), but in some physical address order so that you don't get ABBA deadlocks between different cores doing different accesses.

With a small enough N, that should be eminently doable. But "small enough N" is small! We know N=1 works, but it's not good enough if you want to check a lock (one cacheline) and do a doubly linked list update (two more cachelines) at the same time. But maybe N=4 is big enough to be useful, but small enough that hardware can easily still deal with inconvenient ordering requirements and actually give forward progress guarantees.

Also, just to hit N=4, you'd probably need your L1 cache to be at least 4-way associative, because you need to be able to guarantee that you can keep those four accesses in cache all the time. Even if they all were to hit in the same set.

So forward progress guarantees is one very basic requirement for it being easy to use, but it also means that the size of the transaction has to be small. Definitely no more "random C code".

Just like LL/SC.

The other "make it actually easy to use for real" requirement is no asynchronous aborts. You can have a hard exception on some "hey, you tried to do more than N accesses", but that's a software bug, it wouldn't be a "let's retry without using transactions".

IOW, really make it act like LL/SC - an LL doesn't need to save state for recovery, and a SC failure doesn't "abort", it doesn't go to some other code sequence, it just writes a value to a register that the user can test for "did this sequence of writes complete fully or not", and then just repeat.

And because forward progress is guaranteed, SW doesn't need to have fallback code, so it really just becomes that "repeat on failure" loop.

Just like LL/SC.

I'll take something simple and reliable (and reliably fast) over the HTM mess every time.

In the kernel, we already use a very special version of "N=2" by just putting the lock and the data structure it protects next to each other, so that we can use a double-wide atomic access do do them both in one go. But it requires some very special data structures to do that, and requires that the lock has been split out to be a per-data thing. Which is not realistic for most things.

And sure, a compiler could still recognize simple patterns and turn them into LLx/SCx sequences. But more likely, it would all be in libraries and language runtimes.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Hardware Transactional Memory, the end?rwessel2022/08/20 06:50 PM
  Hardware Transactional Memory, the end?Kara2022/08/20 11:04 PM
    Hardware Transactional Memory, the end?dmcq2022/08/21 11:36 AM
      Hardware Transactional Memory, the end?rwessel2022/08/21 12:17 PM
        Hardware Transactional Memory, the end?---2022/08/21 01:26 PM
          Hardware Transactional Memory, the end?Andrey2022/08/21 06:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:27 PM
              Hardware Transactional Memory, the end?Andrey2022/08/23 06:29 AM
                Hardware Transactional Memory, the end?---2022/08/23 10:00 AM
                  Hardware Transactional Memory, the end?iz2022/08/23 01:20 PM
                    Hardware Transactional Memory, the end?anonymou52022/08/23 02:57 PM
                  Hardware Transactional Memory, the end?Andrey2022/08/23 06:01 PM
                    Hardware Transactional Memory, the end?Anon2022/08/23 06:28 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 04:10 AM
                        Hardware Transactional Memory, the end?Anon2022/08/24 08:50 AM
                          Hardware Transactional Memory, the end?rwessel2022/08/24 09:35 AM
                          Hardware Transactional Memory, the end?Etienne2022/08/25 01:54 AM
                            Hardware Transactional Memory, the end?Anon2022/08/25 05:25 AM
                              Hardware Transactional Memory, the end?Etienne2022/08/25 06:24 AM
                                Hardware Transactional Memory, the end?rwessel2022/08/25 08:16 AM
                                  Hardware Transactional Memory, the end?Linus Torvalds2022/08/25 10:16 AM
                                    Hardware Transactional Memory, the end?rwessel2022/08/25 11:00 AM
                                  Hardware Transactional Memory, the end?Etienne2022/08/26 12:54 PM
                    Hardware Transactional Memory, the end?anon22022/08/23 08:50 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 03:54 AM
                        Hardware Transactional Memory, the end?anon22022/08/24 04:54 AM
                      Hardware Transactional Memory, the end?Simon Farnsworth2022/08/24 03:58 AM
                        Hardware Transactional Memory, the end?Konrad Schwarz2022/08/28 06:12 AM
          Hardware Transactional Memory, the end?Linus Torvalds2022/08/21 06:58 PM
            Hardware Transactional Memory, the end?rwessel2022/08/21 08:02 PM
              Hardware Transactional Memory, the end?anon22022/08/21 09:31 PM
                Hardware Transactional Memory, the end?dmcq2022/08/22 07:14 AM
                  Hardware Transactional Memory, the end?anon22022/08/23 12:15 AM
                    Hardware Transactional Memory, the end?dmcq2022/08/24 03:50 AM
                      Hardware Transactional Memory, the end?Linus Torvalds2022/08/24 11:56 AM
                        Hardware Transactional Memory, the end?dmcq2022/08/25 04:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:33 PM
              Hardware Transactional Memory, the end?Linus Torvalds2022/08/22 11:32 AM
                Hardware Transactional Memory, the end?Anon2022/08/22 12:35 PM
                  Hardware Transactional Memory, the end?rwessel2022/08/22 04:47 PM
                Hardware Transactional Memory, the end?gpd2022/08/26 03:07 AM
                  Hardware Transactional Memory, the end?Michael S2022/08/26 03:46 AM
                    Hardware Transactional Memory, the end?Linus Torvalds2022/08/26 11:01 AM
                      Hardware Transactional Memory, the end?rwessel2022/08/26 06:08 PM
                        Hardware Transactional Memory, the end?anonymou52022/08/26 07:52 PM
  Hardware Transactional Memory, the end?zArchJon2022/08/24 10:12 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? ūüćä