Hardware Transactional Memory, the end?

By: --- (---.delete@this.redheron.com), August 21, 2022 9:33 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 21, 2022 6:58 pm wrote:
> --- (---.delete@this.redheron.com) on August 21, 2022 1:26 pm wrote:
> > What you lose is the SW ease of HTM,
> "SW ease of HTM" never existed.
> In fact, I would argue that one of the big failures of HTM was to aim for it being some
> kind of generic thing in the first place. That just made it big, intrusive, and fragile.
> Large transactions are a mistake. They are fundamentally fragile. It's like designing for a glass jaw: "this
> will go really well if all the stars align, but if they don't it's going to be expensive as heck".
> And the bigger the transaction, the bigger the cost of failure, but also the
> likelihood of failure goes up. A lot. And it goes up the most in the exact
> situation where you do not want it to go up - when there is contention.
> So I'd much rather have something small, simple, and predictably high-performance.
> IOW, I'd rather get something like perhaps just an extended load-linked / store-conditional model. Very
> much with the hardware guaranteed forward progress that LL/SC has. Just extended to a couple more accesses.
> Take LL/SC and make it LLx/SCx - and keep the forward progress guarantees. In hardware.
> And you can only give those forward progress guarantees if you end up just having code
> length limits, and seriously limit the number of accesses. And keep it simple enough that
> hardware people (and software people) can actually believe that you get it right.
> For example, forward progress guarantees almost certainly mean that you'd probably have to do
> something like order the accesses by physical address when you repeat, so that you can then actually
> force them to stay in cache until completion without getting into nasty deadlocks.
> Don't order things on the first try, probably not on the second, but if you've seen several
> failed sequences in a row with no successes, keep track of the addresses that are used, and
> lock them in cache (within those very strict limits you set), but in some physical address
> order so that you don't get ABBA deadlocks between different cores doing different accesses.
> With a small enough N, that should be eminently doable. But "small enough N" is small! We know N=1 works,
> but it's not good enough if you want to check a lock (one cacheline) and do a doubly linked list update (two
> more cachelines) at the same time. But maybe N=4 is big enough to be useful, but small enough that hardware
> can easily still deal with inconvenient ordering requirements and actually give forward progress guarantees.
> Also, just to hit N=4, you'd probably need your L1 cache to be at least 4-way associative,
> because you need to be able to guarantee that you can keep those four accesses in
> cache all the time. Even if they all were to hit in the same set.
> So forward progress guarantees is one very basic requirement for it being easy to use, but it also
> means that the size of the transaction has to be small. Definitely no more "random C code".
> Just like LL/SC.
> The other "make it actually easy to use for real" requirement is no asynchronous aborts.
> You can have a hard exception on some "hey, you tried to do more than N accesses", but
> that's a software bug, it wouldn't be a "let's retry without using transactions".
> IOW, really make it act like LL/SC - an LL doesn't need to save state for recovery, and a SC failure
> doesn't "abort", it doesn't go to some other code sequence, it just writes a value to a register that
> the user can test for "did this sequence of writes complete fully or not", and then just repeat.
> And because forward progress is guaranteed, SW doesn't need to have fallback
> code, so it really just becomes that "repeat on failure" loop.
> Just like LL/SC.
> I'll take something simple and reliable (and reliably fast) over the HTM mess every time.
> In the kernel, we already use a very special version of "N=2" by just putting the lock and the data
> structure it protects next to each other, so that we can use a double-wide atomic access do do them
> both in one go. But it requires some very special data structures to do that, and requires that
> the lock has been split out to be a per-data thing. Which is not realistic for most things.
> And sure, a compiler could still recognize simple patterns and turn them into LLx/SCx
> sequences. But more likely, it would all be in libraries and language runtimes.
> Linus

Just so I understand, Linus, are you saying that
(a) You NEED LL4/SC4 for correctness? That is, if you can trust LL4/SC4 to do what's promised, various constructions that today are very messy/finicky/slow can be replaced with something much simpler

(b) are you saying that if something that could speculate across say 4 atomics that would give you what you wanted? In other words, it's not a question of replacing messy/finicky code, it's just a question of performance, but if speculation were to extend well from one atomic to two nearby atomics to four nearby atomics, at each stage there'd be a nice performance boost?
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Hardware Transactional Memory, the end?rwessel2022/08/20 06:50 PM
  Hardware Transactional Memory, the end?Kara2022/08/20 11:04 PM
    Hardware Transactional Memory, the end?dmcq2022/08/21 11:36 AM
      Hardware Transactional Memory, the end?rwessel2022/08/21 12:17 PM
        Hardware Transactional Memory, the end?---2022/08/21 01:26 PM
          Hardware Transactional Memory, the end?Andrey2022/08/21 06:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:27 PM
              Hardware Transactional Memory, the end?Andrey2022/08/23 06:29 AM
                Hardware Transactional Memory, the end?---2022/08/23 10:00 AM
                  Hardware Transactional Memory, the end?iz2022/08/23 01:20 PM
                    Hardware Transactional Memory, the end?anonymou52022/08/23 02:57 PM
                  Hardware Transactional Memory, the end?Andrey2022/08/23 06:01 PM
                    Hardware Transactional Memory, the end?Anon2022/08/23 06:28 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 04:10 AM
                        Hardware Transactional Memory, the end?Anon2022/08/24 08:50 AM
                          Hardware Transactional Memory, the end?rwessel2022/08/24 09:35 AM
                          Hardware Transactional Memory, the end?Etienne2022/08/25 01:54 AM
                            Hardware Transactional Memory, the end?Anon2022/08/25 05:25 AM
                              Hardware Transactional Memory, the end?Etienne2022/08/25 06:24 AM
                                Hardware Transactional Memory, the end?rwessel2022/08/25 08:16 AM
                                  Hardware Transactional Memory, the end?Linus Torvalds2022/08/25 10:16 AM
                                    Hardware Transactional Memory, the end?rwessel2022/08/25 11:00 AM
                                  Hardware Transactional Memory, the end?Etienne2022/08/26 12:54 PM
                    Hardware Transactional Memory, the end?anon22022/08/23 08:50 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 03:54 AM
                        Hardware Transactional Memory, the end?anon22022/08/24 04:54 AM
                      Hardware Transactional Memory, the end?Simon Farnsworth2022/08/24 03:58 AM
                        Hardware Transactional Memory, the end?Konrad Schwarz2022/08/28 06:12 AM
          Hardware Transactional Memory, the end?Linus Torvalds2022/08/21 06:58 PM
            Hardware Transactional Memory, the end?rwessel2022/08/21 08:02 PM
              Hardware Transactional Memory, the end?anon22022/08/21 09:31 PM
                Hardware Transactional Memory, the end?dmcq2022/08/22 07:14 AM
                  Hardware Transactional Memory, the end?anon22022/08/23 12:15 AM
                    Hardware Transactional Memory, the end?dmcq2022/08/24 03:50 AM
                      Hardware Transactional Memory, the end?Linus Torvalds2022/08/24 11:56 AM
                        Hardware Transactional Memory, the end?dmcq2022/08/25 04:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:33 PM
              Hardware Transactional Memory, the end?Linus Torvalds2022/08/22 11:32 AM
                Hardware Transactional Memory, the end?Anon2022/08/22 12:35 PM
                  Hardware Transactional Memory, the end?rwessel2022/08/22 04:47 PM
                Hardware Transactional Memory, the end?gpd2022/08/26 03:07 AM
                  Hardware Transactional Memory, the end?Michael S2022/08/26 03:46 AM
                    Hardware Transactional Memory, the end?Linus Torvalds2022/08/26 11:01 AM
                      Hardware Transactional Memory, the end?rwessel2022/08/26 06:08 PM
                        Hardware Transactional Memory, the end?anonymou52022/08/26 07:52 PM
  Hardware Transactional Memory, the end?zArchJon2022/08/24 10:12 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊