By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), September 1, 2012 6:29 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on August 31, 2012 9:13 am wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> August 28, 2012 10:28 am wrote:
>> I wonder if defining the granularity for RTM to be cache
>> line size is appropriate.
>
> There's really no other way to make things work efficiently.
> You already have tags for each line, so why not just add a
> bit or two?
Since a cache line can be smaller than a cache sector (the indexed and tagged chunk), the use of cache line granularity is more flexible than I initially thought (I did discover this fact before my earlier posting however). Even so, it might be attractive to monitor some memory locations at finer granularity.
Architecturally speaking, it also seems a little risky to define one microarchitectural feature as equal to another (even if all reasonable implementations one can think of guarantee this equality). If the risk is small enough, the greater simplicity could justify the decision; but MONITOR/MWAIT already provides a size (range) which seems conceptually closer (than cache line size) to TM monitoring size.
By architecturally defining a range of granularity (similar to MONITOR/MWAIT), implementers would have greater freedom (and, for TM, there would not seem to be any significant software issues). It might be slightly easier to allow two transactions to be simultaneous than to technically order them (even if a "yoctosecond" apart) when they do not conflict within a finer granularity than a cache line (but do conflict at cache line granularity). (Admittedly, supporting ordering would be much more broadly useful, especially since finer-grained non-conflicts--with cache-line conflicts--would tend to be relatively rare.)
However, I do not see any disadvantages to using a range, and I assume Intel had a reason for supporting a range for MONITOR/MWAIT (which has similar coherence issues).
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on
> August 28, 2012 10:28 am wrote:
>> I wonder if defining the granularity for RTM to be cache
>> line size is appropriate.
>
> There's really no other way to make things work efficiently.
> You already have tags for each line, so why not just add a
> bit or two?
Since a cache line can be smaller than a cache sector (the indexed and tagged chunk), the use of cache line granularity is more flexible than I initially thought (I did discover this fact before my earlier posting however). Even so, it might be attractive to monitor some memory locations at finer granularity.
Architecturally speaking, it also seems a little risky to define one microarchitectural feature as equal to another (even if all reasonable implementations one can think of guarantee this equality). If the risk is small enough, the greater simplicity could justify the decision; but MONITOR/MWAIT already provides a size (range) which seems conceptually closer (than cache line size) to TM monitoring size.
By architecturally defining a range of granularity (similar to MONITOR/MWAIT), implementers would have greater freedom (and, for TM, there would not seem to be any significant software issues). It might be slightly easier to allow two transactions to be simultaneous than to technically order them (even if a "yoctosecond" apart) when they do not conflict within a finer granularity than a cache line (but do conflict at cache line granularity). (Admittedly, supporting ordering would be much more broadly useful, especially since finer-grained non-conflicts--with cache-line conflicts--would tend to be relatively rare.)
However, I do not see any disadvantages to using a range, and I assume Intel had a reason for supporting a range for MONITOR/MWAIT (which has similar coherence issues).
Topic | Posted By | Date |
---|---|---|
Article: Haswell TM Alternatives | David Kanter | 2012/08/21 09:17 PM |
Article: Haswell TM Alternatives | Håkan Winbom | 2012/08/21 11:52 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 01:06 AM |
Article: Haswell TM Alternatives | anon | 2012/08/22 08:46 AM |
Article: Haswell TM Alternatives | Linus Torvalds | 2012/08/22 09:16 AM |
Article: Haswell TM Alternatives | Doug S | 2012/08/24 08:34 AM |
AMD's ASF even more limited | Paul A. Clayton | 2012/08/22 09:20 AM |
AMD's ASF even more limited | Linus Torvalds | 2012/08/22 09:41 AM |
Compiler use of ll/sc? | Paul A. Clayton | 2012/08/28 09:28 AM |
Compiler use of ll/sc? | Linus Torvalds | 2012/09/08 12:58 PM |
Lock recognition? | Paul A. Clayton | 2012/09/10 01:17 PM |
Sorry, I was confused | Paul A. Clayton | 2012/09/13 10:56 AM |
Filter to detect store conflicts | Paul A. Clayton | 2012/08/22 09:19 AM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 02:02 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 02:45 PM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 09:56 PM |
Cache line granularity? | Paul A. Clayton | 2012/08/28 09:28 AM |
Cache line granularity? | David Kanter | 2012/08/31 08:13 AM |
A looser definition might have advantages | Paul A. Clayton | 2012/09/01 06:29 AM |
Cache line granularity? | rwessel | 2012/08/31 07:54 PM |
Alpha load locked granularity | Paul A. Clayton | 2012/09/01 06:29 AM |
Alpha load locked granularity | anon | 2012/09/02 05:23 PM |
Alpha pages groups | Paul A. Clayton | 2012/09/03 04:16 AM |
An alternative implementation | Maynard Handley | 2012/11/20 09:52 PM |
An alternative implementation | bakaneko | 2012/11/21 05:52 AM |
Guarding unread values? | Paul A. Clayton | 2012/11/21 08:39 AM |
Guarding unread values? | bakaneko | 2012/11/21 11:25 AM |
TM granularity and versioning | Paul A. Clayton | 2012/11/21 08:27 AM |
TM granularity and versioning | Maynard Handley | 2012/11/21 10:52 AM |
Indeed, TM (and coherence) has devilish details (NT) | Paul A. Clayton | 2012/11/21 10:56 AM |