An alternative implementation

Article: Haswell Transactional Memory Alternatives
By: Maynard Handley (, November 20, 2012 9:52 pm
Room: Moderated Discussions
"Intel’s TSX tracks the read-set and write-set at cache line (64B) granularity during a transaction. An RS conflict occurs if a cache line in the read-set is written by another thread."

Is it really necessary to do things this way? What I have in mind is something like this:
- to save memory most code packs a variety of related variables together (ie nearby in the same cache line)
- these sorts of schemes (track by the cache line) mean that the optimistic viewpoint (transaction goes through without having to rollback) IF either no-one else wants to touch this data at the same time OR someone else does touch the data at the same time, but everything they touch lives in a different cache block.

These strike me as unsatisfactory conditions. The first requires the lock not to be too busy, the second requires that some thought was applied (along with coaxing the compiler in some way) to try to segregate different blocks of data into different cache lines. Both of these are inimical to the primary goal here, which is that we want to be able to write highly threaded code with just a single BGL (big giant lock) which protects pretty much everything, and still have it run fast. (Ie we require our programmers to know enough to know that they should protect shared data with the BGL or equivalent, but we don't require that they have to micromanage tons of small locks for efficiency's sake.)

So how could we do better? How about the following alternative implementation?
Rather than a single bit per cache line, we have something like 2 bits per 32-bit block in the cache line. These 2 bits indicate one of 3 simultaneous transactions going on. (00 is the usual "standard bits, no transaction here" identifier, giving us only 3 "real" identifiers. This allows us to detect a collision if transaction 10 tries to write to a 32-bit block that is already "claimed" by transaction 11.

This sort of scheme, it seems to me, allows the finer granularity that we want, of allowing multiple threads to update data that is close together (in the same cache line) as long as they don't actually use the exact same bytes.

Obviously there is flexibility here to best balance capabilities vs transistors. For example you can use three bits rather than two and allow up to seven transactions, or you can make the tracking granularity larger (64 bits wide? maybe even 128 bits wide?) or smaller (16 or even 8 bits wide).
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Article: Haswell TM AlternativesDavid Kanter2012/08/21 09:17 PM
  Article: Haswell TM AlternativesHåkan Winbom2012/08/21 11:52 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 01:06 AM
  Article: Haswell TM Alternativesanon2012/08/22 08:46 AM
    Article: Haswell TM AlternativesLinus Torvalds2012/08/22 09:16 AM
      Article: Haswell TM AlternativesDoug S2012/08/24 08:34 AM
    AMD's ASF even more limitedPaul A. Clayton2012/08/22 09:20 AM
      AMD's ASF even more limitedLinus Torvalds2012/08/22 09:41 AM
        Compiler use of ll/sc?Paul A. Clayton2012/08/28 09:28 AM
          Compiler use of ll/sc?Linus Torvalds2012/09/08 12:58 PM
            Lock recognition?Paul A. Clayton2012/09/10 01:17 PM
              Sorry, I was confusedPaul A. Clayton2012/09/13 10:56 AM
  Filter to detect store conflictsPaul A. Clayton2012/08/22 09:19 AM
  Article: Haswell TM Alternativesbakaneko2012/08/22 02:02 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 02:45 PM
      Article: Haswell TM Alternativesbakaneko2012/08/22 09:56 PM
  Cache line granularity?Paul A. Clayton2012/08/28 09:28 AM
    Cache line granularity?David Kanter2012/08/31 08:13 AM
      A looser definition might have advantagesPaul A. Clayton2012/09/01 06:29 AM
    Cache line granularity?rwessel2012/08/31 07:54 PM
      Alpha load locked granularityPaul A. Clayton2012/09/01 06:29 AM
        Alpha load locked granularityanon2012/09/02 05:23 PM
          Alpha pages groupsPaul A. Clayton2012/09/03 04:16 AM
  An alternative implementationMaynard Handley2012/11/20 09:52 PM
    An alternative implementationbakaneko2012/11/21 05:52 AM
      Guarding unread values?Paul A. Clayton2012/11/21 08:39 AM
        Guarding unread values?bakaneko2012/11/21 11:25 AM
    TM granularity and versioningPaul A. Clayton2012/11/21 08:27 AM
      TM granularity and versioningMaynard Handley2012/11/21 10:52 AM
        Indeed, TM (and coherence) has devilish details (NT)Paul A. Clayton2012/11/21 10:56 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊