By: David Kanter (dkanter.delete@this.realworldtech.com), September 25, 2010 9:27 am
Room: Moderated Discussions
Linus Torvalds (torvalds@linux-foundation.org) on 9/24/10 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/24/10 wrote:
>>
>>IIRC the "TM" in TMTA chips was the gated store buffer, which was limited to 32
>>stores. Is that what you are referring to?
>
>.. together with the alias hardware, yes.
>
>And whether you do it at a store buffer or in the L1
>cache is kind of a detail. There was a version that did
>it in the cache too. The cache version isn't necessarily
>any better, even if the L1 cache is much bigger: it ends
>up having other limitations, like the number of ways in
>the cache.
>
>So doing transactional memory in the cache may give you
>bigger transactions, but it can easily give you smaller
>ones too. Way thrashing isn't that uncommon - even with
>8-way associativity, you can have allocation patterns that
>cause lots of conflicts.
Absolutely. There are a lot of ways to handle that though.
You pin pointed the most important one (be able to resize your transactions dynamically).
But you could also spill into a L2 cache, which should fix a lot of those problems.
>>Yes that's quite likely, although depends on TX size. The
>>larger your TX, the more important it will be.
>
>Yes. But if you use transactional memory as a way to
>elide locking (not just as a fancier "load locked and
>store conditional" to do atomic linked lists and hash
>tables), your transaction size really does need to
>be pretty big.
Agreed.
>Easily big enough that you really take a huge hit if
>you mispredict. Which, for statically compiled code, you're
>going to do all the time (or alternatively, you won't be
>using your fancy TM nearly as much as you could, because
>you realize that you cannot afford to take the risk on
>even slightly questionable code).
I think that predicting address patterns is just the same as predicting branches. Static might get it right 60% of the time on average, but it's still practically useless and requires dynamic information and estimates.
>So that's what it boils down to: transactions are "free"
>and a wonderful way to elide those horrible expensive >locks.
>But only if you never make a mistake.
>
>They are expensive as hell even for very low rates of
>transaction failures. And you really cannot know statically
>(even if you don't end up reaching some transaction limit,
>you may easily end up just having heavy contention on the
>data structures in question).
>
>So I claim that anybody who does transactional memory
>without having a very good dynamic fallback is basically
>totally incompetent. And so far I haven't seen anything
>that convinces me that competence even exists in this area.
I agree 100%. If you look at Rock, they did not have a dynamic fallback, and pretty much anything could cause a TX abort. No surprise that the performance was not up to par.
David
---------------------------
>David Kanter (dkanter@realworldtech.com) on 9/24/10 wrote:
>>
>>IIRC the "TM" in TMTA chips was the gated store buffer, which was limited to 32
>>stores. Is that what you are referring to?
>
>.. together with the alias hardware, yes.
>
>And whether you do it at a store buffer or in the L1
>cache is kind of a detail. There was a version that did
>it in the cache too. The cache version isn't necessarily
>any better, even if the L1 cache is much bigger: it ends
>up having other limitations, like the number of ways in
>the cache.
>
>So doing transactional memory in the cache may give you
>bigger transactions, but it can easily give you smaller
>ones too. Way thrashing isn't that uncommon - even with
>8-way associativity, you can have allocation patterns that
>cause lots of conflicts.
Absolutely. There are a lot of ways to handle that though.
You pin pointed the most important one (be able to resize your transactions dynamically).
But you could also spill into a L2 cache, which should fix a lot of those problems.
>>Yes that's quite likely, although depends on TX size. The
>>larger your TX, the more important it will be.
>
>Yes. But if you use transactional memory as a way to
>elide locking (not just as a fancier "load locked and
>store conditional" to do atomic linked lists and hash
>tables), your transaction size really does need to
>be pretty big.
Agreed.
>Easily big enough that you really take a huge hit if
>you mispredict. Which, for statically compiled code, you're
>going to do all the time (or alternatively, you won't be
>using your fancy TM nearly as much as you could, because
>you realize that you cannot afford to take the risk on
>even slightly questionable code).
I think that predicting address patterns is just the same as predicting branches. Static might get it right 60% of the time on average, but it's still practically useless and requires dynamic information and estimates.
>So that's what it boils down to: transactions are "free"
>and a wonderful way to elide those horrible expensive >locks.
>But only if you never make a mistake.
>
>They are expensive as hell even for very low rates of
>transaction failures. And you really cannot know statically
>(even if you don't end up reaching some transaction limit,
>you may easily end up just having heavy contention on the
>data structures in question).
>
>So I claim that anybody who does transactional memory
>without having a very good dynamic fallback is basically
>totally incompetent. And so far I haven't seen anything
>that convinces me that competence even exists in this area.
I agree 100%. If you look at Rock, they did not have a dynamic fallback, and pretty much anything could cause a TX abort. No surprise that the performance was not up to par.
David
Topic | Posted By | Date |
---|---|---|
T3 announced | Max | 2010/09/21 04:42 AM |
T3 announced | someone | 2010/09/21 05:53 AM |
T3 announced | anon | 2010/09/21 06:05 AM |
T3 announced | lurker | 2010/09/21 07:11 AM |
T3 announced | Jesper Frimann | 2010/09/21 07:21 AM |
T3 announced | Phil | 2010/09/22 12:59 AM |
T3 announced | Michael S | 2010/09/22 06:16 AM |
T3 announced | Linus Torvalds | 2010/09/21 07:15 AM |
T3 announced | anon | 2010/09/21 09:31 AM |
Transactional memory | Paul A. Clayton | 2010/09/21 10:52 AM |
Transactional memory | Linus Torvalds | 2010/09/21 12:21 PM |
Transactional memory | Paul A. Clayton | 2010/09/23 07:30 AM |
Transactional memory | Linus Torvalds | 2010/09/23 08:01 AM |
Transactional memory | David Kanter | 2010/09/24 12:05 AM |
Transactional memory | Linus Torvalds | 2010/09/24 07:59 AM |
Transactional memory | David Kanter | 2010/09/25 09:27 AM |
'dynamic fallback'? | Paul A. Clayton | 2010/09/25 11:28 AM |
'dynamic fallback'? | Linus Torvalds | 2010/09/25 01:23 PM |
'dynamic fallback'? | blaine | 2010/09/25 02:16 PM |
Cliff Click Jr. on Azul's HTM | Paul A. Clayton | 2010/09/24 02:19 PM |
Transactional memory | Foo_ | 2010/09/24 03:08 AM |
T3 announced | blaine | 2010/09/21 11:43 AM |
no news from Fujitsu | Max | 2010/09/21 10:37 PM |