Analysis of Haswell’s Transactional Memory

Pages: 1 2 3 4

TSX Analysis

The success of TSX depends entirely on how well Haswell works. Fortunately, Intel’s transactional memory in Haswell seems to be straight forward, logical and relatively simple; as long as transactions can span both the L1D and L2 caches, the performance should be good. With the benefit of several implementations in hindsight (Azul, Sun’s Rock, Transmeta), TSX seems to have avoided many problems. Previous hardware TM systems were plagued by associativity conflicts, which Intel probably dealt with by using the L2 cache for transactional data. TSX abort operations can return a code that indicates the proximate cause, to help diagnosing hardware and software bugs, and debugger support is an integral part of the specification. Building Hardware Lock Elision on top of the TM system is a clever move that helps existing code easily reap performance gains with minimal efforts, which enables developers to test out TSX before committing to writing code for Restricted Transactional Memory.

The fact that RTM has no guarantees of forward progress and requires a non-transactional fallback path is a bit of a mixed bag. On one hand, it creates an extra burden for programmers, which is quite unattractive. However, it also neatly sidesteps the problem of dealing with poorly written transactional code and compatibility. Someone will eventually write horrible transactional code that deadlocks any feasible system. It is unreasonable to force programmers to deal with subtle details like cache associativity conflicts, but the underlying hardware has finite resources. Without some form of virtualized transactional memory, there is always a concern about transactions that cannot succeed.

Generally, Intel’s TSX should be helpful for improving the programmability and scalability for concurrent workloads. Even with a modest number of threads, locks can easily limit the benefits from additional cores. While that is not a problem for 2-4 core processors, it is a much bigger factor going forward. Extremely popular applications such as MySQL have well-known locking issues that HLE or RTM could significantly alleviate.

There are immediate applications in a few areas particularly focused on low level software systems. First, as Azul and Sun already demonstrated, lock elision and TM are powerful tools for scalable garbage collection in Java, and other dynamic languages. Additionally, all the existing research on software TM systems can take advantage of Intel’s RTM to reduce overhead. Research from the University of Toronto has shown that software TM can provide a mild performance gain for a Quake game server (compared to lock-based versions), and Intel’s RTM should have far better results. It will be quite fascinating to see what various software vendors can do with the building blocks that Intel has provided.

Transactional Memory Evolution

Looking forward, there are a number of potential improvements for Intel’s transactional memory. The first and most obvious is moving towards a multi-versioned TM that is more amenable to speculative multithreading. It was somewhat disappointing that Intel did not go down this path initially, even though it is potentially more complex. Perhaps IBM will be able to demonstrate enough benefits with Blue Gene/Q to motivate more sophisticated TM systems. Other refinements might include partial aborts for nested transactions and programmer control over handling conflicting transactions.

The other avenue for Intel is proliferating transactional memory throughout the x86 product line. Today that is mainly SoCs targeted at mobile devices and the upcoming many-core Knight’s Corner. It is hard to see an immediate application for the extremely power sensitive derivatives of Atom, but that likely depends on the adoption of TSX for mainstream products. If TSX can demonstrate a compelling benefit, perhaps for running Microsoft’s .Net languages or Android’s Dalvik, future mobile CPUs might very well ship with transactional memory support.

Transactional memory is a natural and obvious fit for future many-core products. Intel’s RTM is undeniably an easier programming model than many alternatives and can augment existing languages like C++ through libraries and extensions. Moreover, the potential to significantly improve scalability is incredibly attractive for a design with dozens of cores and hundreds of threads. For example, a presentation from Doug Carmean emphasized the challenge in handling contended locks in the early versions of Larrabee. TSX can potentially remove lock contention as an issue for the majority of programs, except where a true dependency exists and serialization is required for correctness. Overall, it would be surprising if the successor to Knight’s Corner did not support some version of TSX.

Beyond Intel, the rest of the industry seems likely to adopt variations of transactional memory in the coming years. Oracle has publicly stated that future SPARC processors will include memory versioning (another term for TM). IBM is already shipping Blue Gene/Q, although it is unclear when (or if) POWER or zArchitecture processors will adopt TM. ARM’s position on TM is unclear, but may evolve once the ARMv8 extension has been finalized.

AMD is continuing to work on ASF, which has several substantive differences from TSX; privileged instructions, forward progress guarantees and an absence of register rollback on abort are just a few examples. AMD and Intel face a choice, fragmenting the x86 ecosystem with incompatible extensions, or harmonizing ASF and TSX. From an industry standpoint, the latter is clearly preferable, and there is a risk that software vendors may be unwilling to tolerate such incompatibilities. However, it could take years for Intel and AMD to align their implementations.

Intel adopting transactional memory is a new era for computer architecture. Haswell represents a nearly two decade journey from the initial popularization of transactional memory in academia, to mainstream adoption in a microprocessor. For those of us who have worked on transactional memory, this is an exceptionally exciting moment. It is a time to look back on the contributions that brought the industry here and begin to ponder the opportunities that may unfold in the coming years.


Pages: « Prev  1 2 3 4  

Discuss (6 comments)