By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), August 28, 2012 9:28 am
Room: Moderated Discussions
I wonder if defining the granularity for RTM to be cache line size is appropriate.
Such would seem to significantly constrain the implementation of finer-grained monitoring. An implementation could monitor at a finer granularity, mainly avoiding some data communication latency, as long as a single order of atomic actions could be guaranteed; but architecturally allowing (but not requiring) finer-grained monitoring might modestly simplify hardware that exploits lack of access-grained conflict.
There does not appear to be much, if any, semantic difference between fine-grained and cache-line granular monitoring since for any collection of transactions that do not conflict at access-level granularity, any ordering of these transactions will be acceptable. (Because hardware could be clever in avoiding transaction failures, it seems that portable software [i.e., working with extremely clever and with simple implementations with the same cache line size] could not even use such to observe the density of "conflicts".)
I wonder if using monitor-line size (introduced for MONITOR/MWAIT)--or something like it--would be better. The monitor-line size includes a minimum and maximum size. Such a range would seem to allow greater freedom of implementation. Alternatively, an architectural minimum size might be defined as the specific access size (which is the minimum required for transactional semantics) and the maximum size might be defined as the cache line size.
Using the cache line size does have the minor annoyance that performance portability becomes more difficult. (The Alpha architectural suggestion for portability of not placing two atomic operands within the same 8 KiB page seems a bit excessive.) Fixing the maximum "monitor-line" size at 64 bytes might not be unreasonable in order to simplify portable software.
By the way, it was interesting to read that Intel (for x86, at least) defines a cache line as being a subset of a cache sector. (I am used to the reverse use of the terms.)
Such would seem to significantly constrain the implementation of finer-grained monitoring. An implementation could monitor at a finer granularity, mainly avoiding some data communication latency, as long as a single order of atomic actions could be guaranteed; but architecturally allowing (but not requiring) finer-grained monitoring might modestly simplify hardware that exploits lack of access-grained conflict.
There does not appear to be much, if any, semantic difference between fine-grained and cache-line granular monitoring since for any collection of transactions that do not conflict at access-level granularity, any ordering of these transactions will be acceptable. (Because hardware could be clever in avoiding transaction failures, it seems that portable software [i.e., working with extremely clever and with simple implementations with the same cache line size] could not even use such to observe the density of "conflicts".)
I wonder if using monitor-line size (introduced for MONITOR/MWAIT)--or something like it--would be better. The monitor-line size includes a minimum and maximum size. Such a range would seem to allow greater freedom of implementation. Alternatively, an architectural minimum size might be defined as the specific access size (which is the minimum required for transactional semantics) and the maximum size might be defined as the cache line size.
Using the cache line size does have the minor annoyance that performance portability becomes more difficult. (The Alpha architectural suggestion for portability of not placing two atomic operands within the same 8 KiB page seems a bit excessive.) Fixing the maximum "monitor-line" size at 64 bytes might not be unreasonable in order to simplify portable software.
By the way, it was interesting to read that Intel (for x86, at least) defines a cache line as being a subset of a cache sector. (I am used to the reverse use of the terms.)
Topic | Posted By | Date |
---|---|---|
Article: Haswell TM Alternatives | David Kanter | 2012/08/21 09:17 PM |
Article: Haswell TM Alternatives | Håkan Winbom | 2012/08/21 11:52 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 01:06 AM |
Article: Haswell TM Alternatives | anon | 2012/08/22 08:46 AM |
Article: Haswell TM Alternatives | Linus Torvalds | 2012/08/22 09:16 AM |
Article: Haswell TM Alternatives | Doug S | 2012/08/24 08:34 AM |
AMD's ASF even more limited | Paul A. Clayton | 2012/08/22 09:20 AM |
AMD's ASF even more limited | Linus Torvalds | 2012/08/22 09:41 AM |
Compiler use of ll/sc? | Paul A. Clayton | 2012/08/28 09:28 AM |
Compiler use of ll/sc? | Linus Torvalds | 2012/09/08 12:58 PM |
Lock recognition? | Paul A. Clayton | 2012/09/10 01:17 PM |
Sorry, I was confused | Paul A. Clayton | 2012/09/13 10:56 AM |
Filter to detect store conflicts | Paul A. Clayton | 2012/08/22 09:19 AM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 02:02 PM |
Article: Haswell TM Alternatives | David Kanter | 2012/08/22 02:45 PM |
Article: Haswell TM Alternatives | bakaneko | 2012/08/22 09:56 PM |
Cache line granularity? | Paul A. Clayton | 2012/08/28 09:28 AM |
Cache line granularity? | David Kanter | 2012/08/31 08:13 AM |
A looser definition might have advantages | Paul A. Clayton | 2012/09/01 06:29 AM |
Cache line granularity? | rwessel | 2012/08/31 07:54 PM |
Alpha load locked granularity | Paul A. Clayton | 2012/09/01 06:29 AM |
Alpha load locked granularity | anon | 2012/09/02 05:23 PM |
Alpha pages groups | Paul A. Clayton | 2012/09/03 04:16 AM |
An alternative implementation | Maynard Handley | 2012/11/20 09:52 PM |
An alternative implementation | bakaneko | 2012/11/21 05:52 AM |
Guarding unread values? | Paul A. Clayton | 2012/11/21 08:39 AM |
Guarding unread values? | bakaneko | 2012/11/21 11:25 AM |
TM granularity and versioning | Paul A. Clayton | 2012/11/21 08:27 AM |
TM granularity and versioning | Maynard Handley | 2012/11/21 10:52 AM |
Indeed, TM (and coherence) has devilish details (NT) | Paul A. Clayton | 2012/11/21 10:56 AM |