An alternative implementation

Article: Haswell Transactional Memory Alternatives
By: bakaneko (nyan.delete@this.hyan.wan), November 21, 2012 5:52 am
Room: Moderated Discussions
Maynard Handley ( on November 20, 2012 10:52 pm wrote:
> "Intel’s TSX tracks the read-set and write-set at cache line (64B) granularity during a transaction.
> An RS conflict occurs if a cache line in the read-set is written by another thread."
> Is it really necessary to do things this way? What I have in mind is something like this:
> - to save memory most code packs a variety of related variables together (ie nearby in the same cache line)
> - these sorts of schemes (track by the cache line) mean that the optimistic viewpoint (transaction goes through
> without having to rollback) IF either no-one else wants to touch this data at the same time OR someone else
> does touch the data at the same time, but everything they touch lives in a different cache block.
> These strike me as unsatisfactory conditions. The first requires the lock not to be too busy, the second
> requires that some thought was applied (along with coaxing the compiler in some way) to try to segregate
> different blocks of data into different cache lines. Both of these are inimical to the primary goal
> here, which is that we want to be able to write highly threaded code with just a single BGL (big giant
> lock) which protects pretty much everything, and still have it run fast. (Ie we require our programmers
> to know enough to know that they should protect shared data with the BGL or equivalent, but we don't
> require that they have to micromanage tons of small locks for efficiency's sake.)

Using a single BGL to protect all possibly shared data is
not a good idea. The problem is that with a multithreaded
program all important data moves into the shared region.

So you need to use something more fine-grained anyway
otherwise you end up with a program where most threads are
waiting on the lock to read/copy/write program data.

You can help this up a bit with locks which care about
readers/writers and COW, but it's still a bottleneck and
workarounds will make things more and more complicated and

If you want to have programs which can use multiple cores,
either the compiler, the language, or you need to take care
of it.

> So how could we do better? How about the following alternative implementation?
> Rather than a single bit per cache line, we have something like 2 bits per 32-bit block in the cache line.
> These 2 bits indicate one of 3 simultaneous transactions going on. (00 is the usual "standard bits, no
> transaction here" identifier, giving us only 3 "real" identifiers. This allows us to detect a collision
> if transaction 10 tries to write to a 32-bit block that is already "claimed" by transaction 11.
> This sort of scheme, it seems to me, allows the finer granularity that we want,
> of allowing multiple threads to update data that is close together (in the same
> cache line) as long as they don't actually use the exact same bytes.
> Obviously there is flexibility here to best balance capabilities vs transistors. For example you can
> use three bits rather than two and allow up to seven transactions, or you can make the tracking granularity
> larger (64 bits wide? maybe even 128 bits wide?) or smaller (16 or even 8 bits wide).

You are right, the granularity is annoying, but i would just
go with protecting byte ranges. There is no need to
compromise once you stop doing transactions as an implicit
consequence of cacheline management.

This way you don't need to add superfluous reads to protect
objects which span multiple cachelines, which is as annoying
as multiple unrelated objects in one cacheline.

But I'm a bit sceptical that it is needed. My opinion is
that using multithreading without a time increase during
development works only with good languages and compilers,
and then you are very far away from the metal (think state
machines/functional languages, lack of control over memory
layout). IOW, the compiler can take care of the granularity
and right instructions.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Article: Haswell TM AlternativesDavid Kanter2012/08/21 09:17 PM
  Article: Haswell TM AlternativesHåkan Winbom2012/08/21 11:52 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 01:06 AM
  Article: Haswell TM Alternativesanon2012/08/22 08:46 AM
    Article: Haswell TM AlternativesLinus Torvalds2012/08/22 09:16 AM
      Article: Haswell TM AlternativesDoug S2012/08/24 08:34 AM
    AMD's ASF even more limitedPaul A. Clayton2012/08/22 09:20 AM
      AMD's ASF even more limitedLinus Torvalds2012/08/22 09:41 AM
        Compiler use of ll/sc?Paul A. Clayton2012/08/28 09:28 AM
          Compiler use of ll/sc?Linus Torvalds2012/09/08 12:58 PM
            Lock recognition?Paul A. Clayton2012/09/10 01:17 PM
              Sorry, I was confusedPaul A. Clayton2012/09/13 10:56 AM
  Filter to detect store conflictsPaul A. Clayton2012/08/22 09:19 AM
  Article: Haswell TM Alternativesbakaneko2012/08/22 02:02 PM
    Article: Haswell TM AlternativesDavid Kanter2012/08/22 02:45 PM
      Article: Haswell TM Alternativesbakaneko2012/08/22 09:56 PM
  Cache line granularity?Paul A. Clayton2012/08/28 09:28 AM
    Cache line granularity?David Kanter2012/08/31 08:13 AM
      A looser definition might have advantagesPaul A. Clayton2012/09/01 06:29 AM
    Cache line granularity?rwessel2012/08/31 07:54 PM
      Alpha load locked granularityPaul A. Clayton2012/09/01 06:29 AM
        Alpha load locked granularityanon2012/09/02 05:23 PM
          Alpha pages groupsPaul A. Clayton2012/09/03 04:16 AM
  An alternative implementationMaynard Handley2012/11/20 09:52 PM
    An alternative implementationbakaneko2012/11/21 05:52 AM
      Guarding unread values?Paul A. Clayton2012/11/21 08:39 AM
        Guarding unread values?bakaneko2012/11/21 11:25 AM
    TM granularity and versioningPaul A. Clayton2012/11/21 08:27 AM
      TM granularity and versioningMaynard Handley2012/11/21 10:52 AM
        Indeed, TM (and coherence) has devilish details (NT)Paul A. Clayton2012/11/21 10:56 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊