By: NoSpammer (no.delete@this.spam.com), April 10, 2021 12:02 am
Room: Moderated Discussions
Aspect of Anonimity (aoa.delete@this.hjiffdhkoouggkl.com) on April 6, 2021 2:16 pm wrote:
> NoSpammer (no.delete@this.spam.com) on April 5, 2021 9:38 pm wrote:
> > Have you ever actually studied the relevant theory?
> >
>
> No.
>
> > So how can hardware assume, a priori, that software cannot cause a deadly
> > embrace under any circumstances?
>
> The crux is that HTM that aborts CANNOT do this! Fallback to normal locking always
> happens on abort, which undisputably can lock up. Even if all you do is retry a
> failed transaction, then congrats, you got a glorified spinlock in userspace!
Well, my point was that you cannot expect to have a workable transaction scheme without aborts or partial rollbacks. However, theoretically, one could devise a scheme that would not always abort to locking path but instead roll back more intelligently and retry.
> > And how would you resolve it?
>
> I suspected that it might be impossible, but:
>
> Use cache coherency to solve your memory coherency problem, and suspend tasks with
> potentially conflicting transactions. Detecting those is the difficult part.
I disagree. Detecting conflicting transactions is the easy part, resolving them in a close to optimal way in all possible circumstances is the difficult part.
> Start with the whole address space of a task and attempt to narrow it down because you
> care about concurrency. Thus you need to restrict the domain of potentially used memory
> for each transaction and provide that information in software at its start.
> If you end up with a taylor-made fit for each transaction,
> then congrats, you have converged to normal locking!
Well, let's think through it with a simplistic top-level view. Suppose you limit the transaction footprint to something reasonable the software and CPU can handle, and let's say you can even prove you have an algorithm that can roll back intelligently to guarantee forward progress and correctness of execution (with respect to ordering etc.). Now, let's say you have 2 threads with interlocking transactions. You need to have a protocol to resolve this, and the two CPU cores will have to exchange some information in order to roll back intelligently. This will take some time. Now suppose you have 3 threads on physically different CPUs that need some protocol to resolve interlocking transactions. Sound like a fine mathematical exercise. I guess it can be done. Now throw in some IRQ/NMI and maybe a couple non-transactional threads trying to read some lines locked by some of the transactions. I'm sure I have forgotten something more, like debugging features or any exceptions. If needed, maybe add just one more layer of nested transactions for more fun.
Can you see the combinatorial explosion of the state space? Can you easily envision a protocol to resolve all of possible situations correctly in a close to optimal way? Can you define a hardware test protocol for the implementation? Can you put timing estimates on resolution of transactions? What would be the waiting time for these to resolve vs. locking? I guess some advanced application of queuing theory could get you that far, with my post-grad math it would certainly take me quite some time to get anywhere near to correct analytical answers to these.
> It's only a subset of what's possible to do with locking primitives and maybe the sweet spot
> for you is some kind of memory segments and maybe you need lots of speculative execution for
> acceptable performance and maybe you want to also track pages besides cachlines to deal with
> overflow, regardless the outlook for superior performance combined with ease of use is grim.
Well in theory much is possible. In theory you can argue that you can even avoid a deadly embrace caused by rookie programmers or random compiled code memory access ordering by intelligently rolling back and retrying transactions. That might even work alright far small transactions that interact rarely. Personally, I'd love that for some of my code. But in practice when not targeted to specifically designed transaction patterns it seems it's not so difficult to get into the territory Linus is rambling about.
> NoSpammer (no.delete@this.spam.com) on April 5, 2021 9:38 pm wrote:
> > Have you ever actually studied the relevant theory?
> >
>
> No.
>
> > So how can hardware assume, a priori, that software cannot cause a deadly
> > embrace under any circumstances?
>
> The crux is that HTM that aborts CANNOT do this! Fallback to normal locking always
> happens on abort, which undisputably can lock up. Even if all you do is retry a
> failed transaction, then congrats, you got a glorified spinlock in userspace!
Well, my point was that you cannot expect to have a workable transaction scheme without aborts or partial rollbacks. However, theoretically, one could devise a scheme that would not always abort to locking path but instead roll back more intelligently and retry.
> > And how would you resolve it?
>
> I suspected that it might be impossible, but:
>
> Use cache coherency to solve your memory coherency problem, and suspend tasks with
> potentially conflicting transactions. Detecting those is the difficult part.
I disagree. Detecting conflicting transactions is the easy part, resolving them in a close to optimal way in all possible circumstances is the difficult part.
> Start with the whole address space of a task and attempt to narrow it down because you
> care about concurrency. Thus you need to restrict the domain of potentially used memory
> for each transaction and provide that information in software at its start.
> If you end up with a taylor-made fit for each transaction,
> then congrats, you have converged to normal locking!
Well, let's think through it with a simplistic top-level view. Suppose you limit the transaction footprint to something reasonable the software and CPU can handle, and let's say you can even prove you have an algorithm that can roll back intelligently to guarantee forward progress and correctness of execution (with respect to ordering etc.). Now, let's say you have 2 threads with interlocking transactions. You need to have a protocol to resolve this, and the two CPU cores will have to exchange some information in order to roll back intelligently. This will take some time. Now suppose you have 3 threads on physically different CPUs that need some protocol to resolve interlocking transactions. Sound like a fine mathematical exercise. I guess it can be done. Now throw in some IRQ/NMI and maybe a couple non-transactional threads trying to read some lines locked by some of the transactions. I'm sure I have forgotten something more, like debugging features or any exceptions. If needed, maybe add just one more layer of nested transactions for more fun.
Can you see the combinatorial explosion of the state space? Can you easily envision a protocol to resolve all of possible situations correctly in a close to optimal way? Can you define a hardware test protocol for the implementation? Can you put timing estimates on resolution of transactions? What would be the waiting time for these to resolve vs. locking? I guess some advanced application of queuing theory could get you that far, with my post-grad math it would certainly take me quite some time to get anywhere near to correct analytical answers to these.
> It's only a subset of what's possible to do with locking primitives and maybe the sweet spot
> for you is some kind of memory segments and maybe you need lots of speculative execution for
> acceptable performance and maybe you want to also track pages besides cachlines to deal with
> overflow, regardless the outlook for superior performance combined with ease of use is grim.
Well in theory much is possible. In theory you can argue that you can even avoid a deadly embrace caused by rookie programmers or random compiled code memory access ordering by intelligently rolling back and retrying transactions. That might even work alright far small transactions that interact rarely. Personally, I'd love that for some of my code. But in practice when not targeted to specifically designed transaction patterns it seems it's not so difficult to get into the territory Linus is rambling about.