By: David Kanter (dkanter.delete@this.realworldtech.com), August 22, 2012 2:45 pm
Room: Moderated Discussions
bakaneko (nyan.delete@this.hyan.wan) on August 22, 2012 3:02 pm wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on August 21, 2012 10:17 pm
> wrote:
> > We previously theorized that Intel’s TSX extensions in Haswell
> use the caches
> > to provide transactional memory semantics. This article
> describes an alternative
> > approach based on minimal changes to the CPU core
> (specifically in the ROB and
> > MOB), contrasts the advantages of the two
> techniques and discusses the expected
> > implementation in Haswell.
> >
>
> > http://www.realworldtech.com/haswell-tm-alt/
> >
> > I
> > also
> muse a bit about when these two techniques (cache-based and MOB-based TM)
> >
> will get implemented on the roadmap and how they can work together in a very
>
> > complimentary fashion.
> >
> > As always comments and discussion
> welcome.
> >
> > David
>
> Mhm, I don't get what is so important here. The
> question
> where to keep the old and new values (L2, L1D, MOB, other
> buffers)
> comes from the microarchitecture. So the old values
> into L2 of the core with
> the transaction and the new values
> into the L1D/MOB/local buffer, depending on
> the amount of
> expected data beyond what can be kept back in the MOB.
It's a microarchitectural detail, but it has significant implications. The L1 is heavily limited by associativity, whereas the MOB is more or less fully associative. Programmers really don't want to worry about associativity, because they almost always believe that any sort of cache is fully associative.
> I don't
> understand how the MOB would have to do more in such
> a scenario, and I don't
> see how important pushing everything
> into the MOB is in the first place. That's
> at least my naive
> technical opinion forgetting all the little details.
I think you're talking about a hybrid TM that uses both caches and MOB. I was primarily discussing how a MOB-only system would work.
> But
> there are other problems: I can't measure transactions.
You can measure TX failure using the fallback path.
> How will changes in the
> microarchitecture change the
> behaviour of programs which use them outside the
> transaction
> size?
Suppose you had a TX that required simultaneously accessing 5 variables that map to the same set. In a scheme that just used a 4-way L1, it would always fail. OTOH, the same TX would be able to succeed on a MOB-based implementation.
> And how long-lived are transactions in sight of
> more
> cooperative mechanisms? Threads which work on the same
> memory always
> cooperate, so models which support better
> cooperation are necessary anyway.
> (Not that I know any.)
I think ~70 memory accesses is about right for things like nice data structures.
David
> David Kanter (dkanter.delete@this.realworldtech.com) on August 21, 2012 10:17 pm
> wrote:
> > We previously theorized that Intel’s TSX extensions in Haswell
> use the caches
> > to provide transactional memory semantics. This article
> describes an alternative
> > approach based on minimal changes to the CPU core
> (specifically in the ROB and
> > MOB), contrasts the advantages of the two
> techniques and discusses the expected
> > implementation in Haswell.
> >
>
> > http://www.realworldtech.com/haswell-tm-alt/
> >
> > I
> > also
> muse a bit about when these two techniques (cache-based and MOB-based TM)
> >
> will get implemented on the roadmap and how they can work together in a very
>
> > complimentary fashion.
> >
> > As always comments and discussion
> welcome.
> >
> > David
>
> Mhm, I don't get what is so important here. The
> question
> where to keep the old and new values (L2, L1D, MOB, other
> buffers)
> comes from the microarchitecture. So the old values
> into L2 of the core with
> the transaction and the new values
> into the L1D/MOB/local buffer, depending on
> the amount of
> expected data beyond what can be kept back in the MOB.
It's a microarchitectural detail, but it has significant implications. The L1 is heavily limited by associativity, whereas the MOB is more or less fully associative. Programmers really don't want to worry about associativity, because they almost always believe that any sort of cache is fully associative.
> I don't
> understand how the MOB would have to do more in such
> a scenario, and I don't
> see how important pushing everything
> into the MOB is in the first place. That's
> at least my naive
> technical opinion forgetting all the little details.
I think you're talking about a hybrid TM that uses both caches and MOB. I was primarily discussing how a MOB-only system would work.
> But
> there are other problems: I can't measure transactions.
You can measure TX failure using the fallback path.
> How will changes in the
> microarchitecture change the
> behaviour of programs which use them outside the
> transaction
> size?
Suppose you had a TX that required simultaneously accessing 5 variables that map to the same set. In a scheme that just used a 4-way L1, it would always fail. OTOH, the same TX would be able to succeed on a MOB-based implementation.
> And how long-lived are transactions in sight of
> more
> cooperative mechanisms? Threads which work on the same
> memory always
> cooperate, so models which support better
> cooperation are necessary anyway.
> (Not that I know any.)
I think ~70 memory accesses is about right for things like nice data structures.
David
Topic | Posted By | Date |
---|---|---|
Article: Haswell TM Alternatives | David Kanter | 2012/08/21 09:17 PM |
Article: Haswell TM Alternatives | Håkan Winbom | 2012/08/21 11:52 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 01:06 AM |
Article: Haswell TM Alternatives | anon | 2012/08/22 08:46 AM |
Article: Haswell TM Alternatives | Linus Torvalds | 2012/08/22 09:16 AM |
Article: Haswell TM Alternatives | Doug S | 2012/08/24 08:34 AM |
AMD's ASF even more limited | Paul A. Clayton | 2012/08/22 09:20 AM |
AMD's ASF even more limited | Linus Torvalds | 2012/08/22 09:41 AM |
Compiler use of ll/sc? | Paul A. Clayton | 2012/08/28 09:28 AM |
Compiler use of ll/sc? | Linus Torvalds | 2012/09/08 12:58 PM |
Lock recognition? | Paul A. Clayton | 2012/09/10 01:17 PM |
Sorry, I was confused | Paul A. Clayton | 2012/09/13 10:56 AM |
Filter to detect store conflicts | Paul A. Clayton | 2012/08/22 09:19 AM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 02:02 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 02:45 PM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 09:56 PM |
Cache line granularity? | Paul A. Clayton | 2012/08/28 09:28 AM |
Cache line granularity? | David Kanter | 2012/08/31 08:13 AM |
A looser definition might have advantages | Paul A. Clayton | 2012/09/01 06:29 AM |
Cache line granularity? | rwessel | 2012/08/31 07:54 PM |
Alpha load locked granularity | Paul A. Clayton | 2012/09/01 06:29 AM |
Alpha load locked granularity | anon | 2012/09/02 05:23 PM |
Alpha pages groups | Paul A. Clayton | 2012/09/03 04:16 AM |
An alternative implementation | Maynard Handley | 2012/11/20 09:52 PM |
An alternative implementation | bakaneko | 2012/11/21 05:52 AM |
Guarding unread values? | Paul A. Clayton | 2012/11/21 08:39 AM |
Guarding unread values? | bakaneko | 2012/11/21 11:25 AM |
TM granularity and versioning | Paul A. Clayton | 2012/11/21 08:27 AM |
TM granularity and versioning | Maynard Handley | 2012/11/21 10:52 AM |
Indeed, TM (and coherence) has devilish details (NT) | Paul A. Clayton | 2012/11/21 10:56 AM |