Lock recognition?

Article: Haswell Transactional Memory Alternatives
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 10, 2012 2:17 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on September 8, 2012 1:58 pm wrote:
> [ was off traveling, sorry for late answer ]

In this case, better late than never (I think).

> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> August 28, 2012 10:28 am wrote:
[snip]
>> ll/sc is such a limited form of transactional memory that
>> recognizing most possible uses would seem not to be too difficult.
>
> Umm. It's not just the "load-op-store" sequence under a
> lock.
>
> It's the lock itself!
>
> Doing a spinlock around a load-op-store sequence is *not* something
> that is equivalent to doing the load-op-store as a ll/sc sequence.
> Yes, both are "atomic". No, they are not the same thing at all
> despite that.

Okay, I was not thinking of explicitly named locks (for which the extent of what is guarded might not be clear). Even in that case, one could conceive that whole program analysis could handle this issue (with tools of the development system--possibly just the compiler--annotating assumptions so that the analysis would not need to be repeated at each compilation), though that is probably expecting far too much for common programming (especially in the case of using ll/sc since the opportunities for use are extremely limited both in terms of ISA support and the single item limitation of ll/sc). On the other hand, some popular programming languages might handle enough of the details of required locking that the need for extra analysis would be substantially reduced for such languages.

[snip]
> So ll/sc is pretty much useless as anything else than a replacement
> for the CISC kind of "atomic increment/cmpxchg/whatever"
> instruction. It has absolutely nothing to do with transactional
> memory that can elide locks.
>
> Don't confuse the two. They really have nothing in common, and
> "atomic" really means many different things.

Presumably then those CISC type of atomic instructions can be generated by a compiler, so ll/sc should have similar portable uses. This is why I disagreed to the statement that ll/sc would not be used "even if ll/sc could work in some particular case". ll/sc is highly portable within an ISA family and its use could be abstracted by a programmer to support other ISAs (even ones that only have test-and-set). (I would not be surprised if it was theoretically possible for a compiler/language to provide only a compare-and-swap primitive and generate ISA-appropriate code that could use ll/sc when such is available/appropriate.)

The limit of a single entry for both read and write set for ll/sc is extremely constraining, but it is a form of transactional memory.

> Having a compiler recognize a lock, and turning it into a lock-elision
> sequence (that has the *semantics* of a lock) is trivial. In fact, the compiler
> doesn't even need to do it, you can just do the lock-eliding part as a function
> call or inline asm. Exactly because unlike ll/sc, it actually has the right
> semantics.

Could this then require separately compiled code that uses locks either use the same compiler and options (so that all critical/locked sections are in transactions) or limit use to HLE (i.e., no locked section could be converted to a lock-free transaction because locked sections compiled with different options or a different compiler might not use transactional memory for the critical/locked section)?

This could make Intel's HLE even more attractive--not only is it compatible across different hardware but it also avoids the problem of mixed usage (as well as fitting more easily into existing code which uses locks and avoiding the need to generate code for the transaction failure case). (RTM would function correctly if the lock store is included in the transaction, but that would remove most of the benefit of RTM [it might still be useful to provide a partial hardware check for unlocked accesses to shared data; as long as the unlocked access only occurs in parallel with a transaction an otherwise inconsistent use would be prevented and detected].)

I do wonder how easily a compiler could recognize and properly handle a lock. This seems like a problem vaguely similar to compile-time automated memory management (hard in the general case but perhaps more tractable in the common cases especially relative to automated memory management since persistent locks are usually not used--though guaranteeing that an improperly written program will lockup rather than run incorrectly may be difficult).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Article: Haswell TM AlternativesDavid Kanter08/21/12 10:17 PM
  Article: Haswell TM AlternativesHåkan Winbom08/22/12 12:52 AM
    Article: Haswell TM AlternativesDavid Kanter08/22/12 02:06 AM
  Article: Haswell TM Alternativesanon08/22/12 09:46 AM
    Article: Haswell TM AlternativesLinus Torvalds08/22/12 10:16 AM
      Article: Haswell TM AlternativesDoug S08/24/12 09:34 AM
    AMD's ASF even more limitedPaul A. Clayton08/22/12 10:20 AM
      AMD's ASF even more limitedLinus Torvalds08/22/12 10:41 AM
        Compiler use of ll/sc?Paul A. Clayton08/28/12 10:28 AM
          Compiler use of ll/sc?Linus Torvalds09/08/12 01:58 PM
            Lock recognition?Paul A. Clayton09/10/12 02:17 PM
              Sorry, I was confusedPaul A. Clayton09/13/12 11:56 AM
  Filter to detect store conflictsPaul A. Clayton08/22/12 10:19 AM
  Article: Haswell TM Alternativesbakaneko08/22/12 03:02 PM
    Article: Haswell TM AlternativesDavid Kanter08/22/12 03:45 PM
      Article: Haswell TM Alternativesbakaneko08/22/12 10:56 PM
  Cache line granularity?Paul A. Clayton08/28/12 10:28 AM
    Cache line granularity?David Kanter08/31/12 09:13 AM
      A looser definition might have advantagesPaul A. Clayton09/01/12 07:29 AM
    Cache line granularity?rwessel08/31/12 08:54 PM
      Alpha load locked granularityPaul A. Clayton09/01/12 07:29 AM
        Alpha load locked granularityanon09/02/12 06:23 PM
          Alpha pages groupsPaul A. Clayton09/03/12 05:16 AM
  An alternative implementationMaynard Handley11/20/12 10:52 PM
    An alternative implementationbakaneko11/21/12 06:52 AM
      Guarding unread values?Paul A. Clayton11/21/12 09:39 AM
        Guarding unread values?bakaneko11/21/12 12:25 PM
    TM granularity and versioningPaul A. Clayton11/21/12 09:27 AM
      TM granularity and versioningMaynard Handley11/21/12 11:52 AM
        Indeed, TM (and coherence) has devilish details (NT)Paul A. Clayton11/21/12 11:56 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?