By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), September 24, 2010 6:59 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 9/24/10 wrote:
>
>IIRC the "TM" in TMTA chips was the gated store buffer, which was limited to 32
>stores. Is that what you are referring to?
.. together with the alias hardware, yes.
And whether you do it at a store buffer or in the L1
cache is kind of a detail. There was a version that did
it in the cache too. The cache version isn't necessarily
any better, even if the L1 cache is much bigger: it ends
up having other limitations, like the number of ways in
the cache.
So doing transactional memory in the cache may give you
bigger transactions, but it can easily give you smaller
ones too. Way thrashing isn't that uncommon - even with
8-way associativity, you can have allocation patterns that
cause lots of conflicts.
>Yes that's quite likely, although depends on TX size. The
>larger your TX, the more important it will be.
Yes. But if you use transactional memory as a way to
elide locking (not just as a fancier "load locked and
store conditional" to do atomic linked lists and hash
tables), your transaction size really does need to
be pretty big.
Easily big enough that you really take a huge hit if
you mispredict. Which, for statically compiled code, you're
going to do all the time (or alternatively, you won't be
using your fancy TM nearly as much as you could, because
you realize that you cannot afford to take the risk on
even slightly questionable code).
So that's what it boils down to: transactions are "free"
and a wonderful way to elide those horrible expensive locks.
But only if you never make a mistake.
They are expensive as hell even for very low rates of
transaction failures. And you really cannot know statically
(even if you don't end up reaching some transaction limit,
you may easily end up just having heavy contention on the
data structures in question).
So I claim that anybody who does transactional memory
without having a very good dynamic fallback is basically
totally incompetent. And so far I haven't seen anything
that convinces me that competence even exists in this area.
Linus
>
>IIRC the "TM" in TMTA chips was the gated store buffer, which was limited to 32
>stores. Is that what you are referring to?
.. together with the alias hardware, yes.
And whether you do it at a store buffer or in the L1
cache is kind of a detail. There was a version that did
it in the cache too. The cache version isn't necessarily
any better, even if the L1 cache is much bigger: it ends
up having other limitations, like the number of ways in
the cache.
So doing transactional memory in the cache may give you
bigger transactions, but it can easily give you smaller
ones too. Way thrashing isn't that uncommon - even with
8-way associativity, you can have allocation patterns that
cause lots of conflicts.
>Yes that's quite likely, although depends on TX size. The
>larger your TX, the more important it will be.
Yes. But if you use transactional memory as a way to
elide locking (not just as a fancier "load locked and
store conditional" to do atomic linked lists and hash
tables), your transaction size really does need to
be pretty big.
Easily big enough that you really take a huge hit if
you mispredict. Which, for statically compiled code, you're
going to do all the time (or alternatively, you won't be
using your fancy TM nearly as much as you could, because
you realize that you cannot afford to take the risk on
even slightly questionable code).
So that's what it boils down to: transactions are "free"
and a wonderful way to elide those horrible expensive locks.
But only if you never make a mistake.
They are expensive as hell even for very low rates of
transaction failures. And you really cannot know statically
(even if you don't end up reaching some transaction limit,
you may easily end up just having heavy contention on the
data structures in question).
So I claim that anybody who does transactional memory
without having a very good dynamic fallback is basically
totally incompetent. And so far I haven't seen anything
that convinces me that competence even exists in this area.
Linus
Topic | Posted By | Date |
---|---|---|
T3 announced | Max | 2010/09/21 03:42 AM |
T3 announced | someone | 2010/09/21 04:53 AM |
T3 announced | anon | 2010/09/21 05:05 AM |
T3 announced | lurker | 2010/09/21 06:11 AM |
T3 announced | Jesper Frimann | 2010/09/21 06:21 AM |
T3 announced | Phil | 2010/09/21 11:59 PM |
T3 announced | Michael S | 2010/09/22 05:16 AM |
T3 announced | Linus Torvalds | 2010/09/21 06:15 AM |
T3 announced | anon | 2010/09/21 08:31 AM |
Transactional memory | Paul A. Clayton | 2010/09/21 09:52 AM |
Transactional memory | Linus Torvalds | 2010/09/21 11:21 AM |
Transactional memory | Paul A. Clayton | 2010/09/23 06:30 AM |
Transactional memory | Linus Torvalds | 2010/09/23 07:01 AM |
Transactional memory | David Kanter | 2010/09/23 11:05 PM |
Transactional memory | Linus Torvalds | 2010/09/24 06:59 AM |
Transactional memory | David Kanter | 2010/09/25 08:27 AM |
'dynamic fallback'? | Paul A. Clayton | 2010/09/25 10:28 AM |
'dynamic fallback'? | Linus Torvalds | 2010/09/25 12:23 PM |
'dynamic fallback'? | blaine | 2010/09/25 01:16 PM |
Cliff Click Jr. on Azul's HTM | Paul A. Clayton | 2010/09/24 01:19 PM |
Transactional memory | Foo_ | 2010/09/24 02:08 AM |
T3 announced | blaine | 2010/09/21 10:43 AM |
no news from Fujitsu | Max | 2010/09/21 09:37 PM |