Hardware Transactional Memory, the end?

By: rwessel (rwessel.delete@this.yahoo.com), August 25, 2022 11:00 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 25, 2022 10:16 am wrote:
> rwessel (rwessel.delete@this.yahoo.com) on August 25, 2022 8:16 am wrote:
> >
> > I'm not sure I understand this concern. The transaction appears (eventually) to all other cores
> > as a single atomic event. In practice another transaction (or a non-transactional access) can interfere,
> > and prevent completion, but they can't cause a deadlock.
>
> Sure, they can cause a deadlock (or more commonly it's really
> a livelock) if you do a simplistic implementation.
>
> In fact, you can very much get that livelock even with just a single cacheline with 'LL/SC', if
> your cache coherency isn't properly designed, so that the 'SC' always fails because other CPU's
> doing their own LL/SC loops always steal the cacheline back before the SC has succeeded.
>
> And no, that's not some theoretical worry. We literally saw a few of the (not great) early ARM
> server core designs with CPU starvation in LL/SC loops when we had atomics share cachelines with
> some very hot other data where lots of other CPU's would hammer on it with regular stores.
>
> Imagine having the NMI watchdog fire and the machine reboot because LL/SC doesn't make sufficient progress
> because others can saturate your cache coherency traffic in user space, and the chip was designed to
> be some flock-of-chickens thing with lots of cores but not the infrastructure to match?
>
> And that's with just a single cacheline needed to complete.
>
> When you have a transaction with multiple cachelines, you can have CPU1 want to load cachelines
> exclusively in order A->B, and another one want to load them in order B->A.
>
> Yes, both CPU's do that in order to expose one atomic sequence, so the ordering
> isn't visible from the end result, but in order to get that atomic sequence
> both CPU's want both cachelines to be exclusively in their caches.
>
> And if the cache protocol isn't well enough designed, they will either get into a deadlock where on
> CPU has one cacheline and wants the other, and the other CPU has that second cacheline but wants the
> first. And neither wants to release it because they need it for the transaction to complete.
>
> More commonly, you don't get a deadlock, but you get the above kind of livelock, where both CPU's
> hold on to the cacheline they have for a while, but then abort and retry - and re-create the
> same issue. Possibly forever, or possibly just for a long time until some other action ends up
> perturbing the timing enough that one CPU finally gets both, and they get to continue.
>
> And before you say it, yes, the solution is trivial. With only a small set of cachelines, just
> ordering them by some shared ordering (typically physical address, but it could be anything else)
> will fix it. And sorting things really is simple when "things" is just a couple of elements.
>
> Of course, even that "trivial" thing is them complicated by the fact that it affects almost the whole
> memory pipeline - the atomic sequence itself will be handled by the core in the L1 cache, but now that
> it requires ordering on the bus you need to really have the logic permeate all the way out...
>
> And considering that people have gotten cache coherency wrong even when N is just one, it's most
> certainly understandable that people would worry about this. And that memory pipeline is most definitely
> the thing most people worry most about even in the absence of any of these kinds of issues.
>
> Side note: you can most definitely get the same kinds of things with a single memory access:
> x86 allows atomic locked instructions that cross cache line - or even page - boundaries, and
> their CPU designers hate the complexity just that causes (Intel calls it "split locks").
>
> It gets even worse when one (or both) of those sides is not regular RAM and doesn't
> even have a cacheline, so the whole "keep cacheline around" approach to make
> it look atomic even if it wasn't really atomic internally doesn't work.
>
> Intel historically solve that with a single global "lock pin", and it's horrendous from a performance and scalability
> issue). And while I'm sure it is solvable other ways, Intel actually is trying to make locked cacheline crossers
> fault so that you can find software that does this (hopefully unintentionally) and just fix it.
>
> The good news is that people used to have so many problems with subtle cache consistency bugs
> that these days I think all CPU designers will basically use automated proofs of correctness
> for the cache protocol. Then you still rely on your cache consistency model that you proved
> to be correct actually matching what the hardware does, but hey, that's a different issue.


My point was that there's nothing inherent in the concept of a transaction that's guaranteed to succeed* eventually that requires the possibility of a dead (or live) lock. You can't make the guarantee for a single execution of the transaction, it may require a series of aborts and retries, during which some extreme coping mechanisms may need to be brought into play.

Bad implementations are a different issue.

*Where success is possible - taking a protection fault in the middle of the transaction will obviously prevent it from completing (but that's going to take out the process as well).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Hardware Transactional Memory, the end?rwessel2022/08/20 06:50 PM
  Hardware Transactional Memory, the end?Kara2022/08/20 11:04 PM
    Hardware Transactional Memory, the end?dmcq2022/08/21 11:36 AM
      Hardware Transactional Memory, the end?rwessel2022/08/21 12:17 PM
        Hardware Transactional Memory, the end?---2022/08/21 01:26 PM
          Hardware Transactional Memory, the end?Andrey2022/08/21 06:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:27 PM
              Hardware Transactional Memory, the end?Andrey2022/08/23 06:29 AM
                Hardware Transactional Memory, the end?---2022/08/23 10:00 AM
                  Hardware Transactional Memory, the end?iz2022/08/23 01:20 PM
                    Hardware Transactional Memory, the end?anonymou52022/08/23 02:57 PM
                  Hardware Transactional Memory, the end?Andrey2022/08/23 06:01 PM
                    Hardware Transactional Memory, the end?Anon2022/08/23 06:28 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 04:10 AM
                        Hardware Transactional Memory, the end?Anon2022/08/24 08:50 AM
                          Hardware Transactional Memory, the end?rwessel2022/08/24 09:35 AM
                          Hardware Transactional Memory, the end?Etienne2022/08/25 01:54 AM
                            Hardware Transactional Memory, the end?Anon2022/08/25 05:25 AM
                              Hardware Transactional Memory, the end?Etienne2022/08/25 06:24 AM
                                Hardware Transactional Memory, the end?rwessel2022/08/25 08:16 AM
                                  Hardware Transactional Memory, the end?Linus Torvalds2022/08/25 10:16 AM
                                    Hardware Transactional Memory, the end?rwessel2022/08/25 11:00 AM
                                  Hardware Transactional Memory, the end?Etienne2022/08/26 12:54 PM
                    Hardware Transactional Memory, the end?anon22022/08/23 08:50 PM
                      Hardware Transactional Memory, the end?Andrey2022/08/24 03:54 AM
                        Hardware Transactional Memory, the end?anon22022/08/24 04:54 AM
                      Hardware Transactional Memory, the end?Simon Farnsworth2022/08/24 03:58 AM
                        Hardware Transactional Memory, the end?Konrad Schwarz2022/08/28 06:12 AM
          Hardware Transactional Memory, the end?Linus Torvalds2022/08/21 06:58 PM
            Hardware Transactional Memory, the end?rwessel2022/08/21 08:02 PM
              Hardware Transactional Memory, the end?anon22022/08/21 09:31 PM
                Hardware Transactional Memory, the end?dmcq2022/08/22 07:14 AM
                  Hardware Transactional Memory, the end?anon22022/08/23 12:15 AM
                    Hardware Transactional Memory, the end?dmcq2022/08/24 03:50 AM
                      Hardware Transactional Memory, the end?Linus Torvalds2022/08/24 11:56 AM
                        Hardware Transactional Memory, the end?dmcq2022/08/25 04:39 PM
            Hardware Transactional Memory, the end?---2022/08/21 09:33 PM
              Hardware Transactional Memory, the end?Linus Torvalds2022/08/22 11:32 AM
                Hardware Transactional Memory, the end?Anon2022/08/22 12:35 PM
                  Hardware Transactional Memory, the end?rwessel2022/08/22 04:47 PM
                Hardware Transactional Memory, the end?gpd2022/08/26 03:07 AM
                  Hardware Transactional Memory, the end?Michael S2022/08/26 03:46 AM
                    Hardware Transactional Memory, the end?Linus Torvalds2022/08/26 11:01 AM
                      Hardware Transactional Memory, the end?rwessel2022/08/26 06:08 PM
                        Hardware Transactional Memory, the end?anonymou52022/08/26 07:52 PM
  Hardware Transactional Memory, the end?zArchJon2022/08/24 10:12 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? ūüćä