More than any recent instruction set extension, such as SSE or AVX, Intel’s transactional memory (TM) is a huge change to the x86 programming model. The Transaction Synchronization eXtensions (TSX) describe two software interfaces for hardware transactional memory in Haswell.
The first is Hardware Lock Elision, which uses the F2H/F3H instruction prefixes to speculatively execute a critical section and enhance performance, while preserving backwards compatibility with non-TSX processors. HLE is based on a concept called speculative lock elision and first described by Ravi Rajwar and Jim Goodman at the University of Wisconsin (Ravi Rajwar is now at Intel). Processors without HLE simply ignore the prefixes and handle the critical section normally.
The second is Restricted Transactional Memory (RTM), which is a new programming interface. Using specific instructions, developers can mark the start and end of a transaction, which will either commit atomically or fallback to an alternative codepath. Software written using RTM is not backwards compatible and requires a TSX processor, such as Haswell.
In our previous article on Haswell’s transactional memory, we explored how Intel could hypothetically extend the MESIF coherency protocol to enable TSX in the L1 and/or L2 caches. This implementation would enable extremely large transactions. The L1 data cache in Sandy Bridge holds 512 cache lines, while the L2 holds 4K cache lines. For well behaved transactions, this could translate into hundreds or thousands of memory accesses. However, the limited associativity (8 way for both L1 and L2) means that some transactions could fail with far fewer memory accesses.
Haswell TM Requirements
Extending the coherency protocol and modifying the caches for transactional memory is one possible implementation, but not the only option. Before discussing alternative approaches to transactional memory, it is important to understand the requirements for Intel’s TSX. Table 1 is a summary of the behavior necessary for transactional memory, using buffered state. Some TM systems avoid buffering by creating an undo log, but that approach is generally more complex and will not be seen in initial implementations.
The biggest challenge of any hardware TM is dealing with writes. A TM system must track stores and assemble a write-set (WS) for the transaction. The actual WS data must be buffered until the end of the transaction. In the case of a successful commit, all the stores in the WS become globally visible in an atomic fashion, typically by writing the buffered data to a cache. Alternatively, if the transaction aborts, then the buffered stores must be discarded, without modifying memory.
Loads are somewhat simpler, because they cannot alter memory, only the architectural registers. The TM must still track loads, creating a read-set (RS). A successful transaction simply writes all the loads in the RS to the register file. Aborting a transaction involves undoing the changes to the register file, which is far easier than undoing changes to the memory system.
The crux of TM is detecting conflicts in a transaction. Intel’s TSX tracks the read-set and write-set at cache line (64B) granularity during a transaction. An RS conflict occurs if a cache line in the read-set is written by another thread. A WS conflict occurs if a cache line in the WS is read or written by another thread. When a conflict occurs, the transaction aborts and must rollback any loads and stores in the transaction, along with dependent instructions.
Discuss (30 comments)