Lock recognition?

Article: Haswell Transactional Memory Alternatives
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 10, 2012 1:17 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on September 8, 2012 1:58 pm wrote:
> [ was off traveling, sorry for late answer ]

In this case, better late than never (I think).

> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> August 28, 2012 10:28 am wrote:
>> ll/sc is such a limited form of transactional memory that
>> recognizing most possible uses would seem not to be too difficult.
> Umm. It's not just the "load-op-store" sequence under a
> lock.
> It's the lock itself!
> Doing a spinlock around a load-op-store sequence is *not* something
> that is equivalent to doing the load-op-store as a ll/sc sequence.
> Yes, both are "atomic". No, they are not the same thing at all
> despite that.

Okay, I was not thinking of explicitly named locks (for which the extent of what is guarded might not be clear). Even in that case, one could conceive that whole program analysis could handle this issue (with tools of the development system--possibly just the compiler--annotating assumptions so that the analysis would not need to be repeated at each compilation), though that is probably expecting far too much for common programming (especially in the case of using ll/sc since the opportunities for use are extremely limited both in terms of ISA support and the single item limitation of ll/sc). On the other hand, some popular programming languages might handle enough of the details of required locking that the need for extra analysis would be substantially reduced for such languages.

> So ll/sc is pretty much useless as anything else than a replacement
> for the CISC kind of "atomic increment/cmpxchg/whatever"
> instruction. It has absolutely nothing to do with transactional
> memory that can elide locks.
> Don't confuse the two. They really have nothing in common, and
> "atomic" really means many different things.

Presumably then those CISC type of atomic instructions can be generated by a compiler, so ll/sc should have similar portable uses. This is why I disagreed to the statement that ll/sc would not be used "even if ll/sc could work in some particular case". ll/sc is highly portable within an ISA family and its use could be abstracted by a programmer to support other ISAs (even ones that only have test-and-set). (I would not be surprised if it was theoretically possible for a compiler/language to provide only a compare-and-swap primitive and generate ISA-appropriate code that could use ll/sc when such is available/appropriate.)

The limit of a single entry for both read and write set for ll/sc is extremely constraining, but it is a form of transactional memory.

> Having a compiler recognize a lock, and turning it into a lock-elision
> sequence (that has the *semantics* of a lock) is trivial. In fact, the compiler
> doesn't even need to do it, you can just do the lock-eliding part as a function
> call or inline asm. Exactly because unlike ll/sc, it actually has the right
> semantics.

Could this then require separately compiled code that uses locks either use the same compiler and options (so that all critical/locked sections are in transactions) or limit use to HLE (i.e., no locked section could be converted to a lock-free transaction because locked sections compiled with different options or a different compiler might not use transactional memory for the critical/locked section)?

This could make Intel's HLE even more attractive--not only is it compatible across different hardware but it also avoids the problem of mixed usage (as well as fitting more easily into existing code which uses locks and avoiding the need to generate code for the transaction failure case). (RTM would function correctly if the lock store is included in the transaction, but that would remove most of the benefit of RTM [it might still be useful to provide a partial hardware check for unlocked accesses to shared data; as long as the unlocked access only occurs in parallel with a transaction an otherwise inconsistent use would be prevented and detected].)

I do wonder how easily a compiler could recognize and properly handle a lock. This seems like a problem vaguely similar to compile-time automated memory management (hard in the general case but perhaps more tractable in the common cases especially relative to automated memory management since persistent locks are usually not used--though guaranteeing that an improperly written program will lockup rather than run incorrectly may be difficult).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Article: Haswell TM AlternativesDavid Kanter2012/08/21 09:17 PM
  Article: Haswell TM AlternativesH√•kan Winbom2012/08/21 11:52 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 01:06 AM
  Article: Haswell TM Alternativesanon2012/08/22 08:46 AM
    Article: Haswell TM AlternativesLinus Torvalds2012/08/22 09:16 AM
      Article: Haswell TM AlternativesDoug S2012/08/24 08:34 AM
    AMD's ASF even more limitedPaul A. Clayton2012/08/22 09:20 AM
      AMD's ASF even more limitedLinus Torvalds2012/08/22 09:41 AM
        Compiler use of ll/sc?Paul A. Clayton2012/08/28 09:28 AM
          Compiler use of ll/sc?Linus Torvalds2012/09/08 12:58 PM
            Lock recognition?Paul A. Clayton2012/09/10 01:17 PM
              Sorry, I was confusedPaul A. Clayton2012/09/13 10:56 AM
  Filter to detect store conflictsPaul A. Clayton2012/08/22 09:19 AM
  Article: Haswell TM Alternativesbakaneko2012/08/22 02:02 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 02:45 PM
      Article: Haswell TM Alternativesbakaneko2012/08/22 09:56 PM
  Cache line granularity?Paul A. Clayton2012/08/28 09:28 AM
    Cache line granularity?David Kanter2012/08/31 08:13 AM
      A looser definition might have advantagesPaul A. Clayton2012/09/01 06:29 AM
    Cache line granularity?rwessel2012/08/31 07:54 PM
      Alpha load locked granularityPaul A. Clayton2012/09/01 06:29 AM
        Alpha load locked granularityanon2012/09/02 05:23 PM
          Alpha pages groupsPaul A. Clayton2012/09/03 04:16 AM
  An alternative implementationMaynard Handley2012/11/20 09:52 PM
    An alternative implementationbakaneko2012/11/21 05:52 AM
      Guarding unread values?Paul A. Clayton2012/11/21 08:39 AM
        Guarding unread values?bakaneko2012/11/21 11:25 AM
    TM granularity and versioningPaul A. Clayton2012/11/21 08:27 AM
      TM granularity and versioningMaynard Handley2012/11/21 10:52 AM
        Indeed, TM (and coherence) has devilish details (NT)Paul A. Clayton2012/11/21 10:56 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? ūüćä