By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), December 4, 2014 3:34 pm
Room: Moderated Discussions
Konrad Schwarz (no.spam.delete@this.no.spam) on December 4, 2014 1:10 pm wrote:
[snip]
> Note that you will also need to return the current or the previous value when the transaction
> is successful.
Yes, at least if the operation was not "fire-and-forget". Yet that could still be lower latency and lower bandwidth than using ordinary coherence for such hot line updates.
> I'm no expert in VLSI design, but this seems like a large increase in complexity in the coherency
> protocol for a modest benefit. I find it more likely that the line in question is protected from
> eviction from the local cache for a short period of time. My understanding is that this is
> basically how modern x86 implements atomic operations on cacheable memory.
The "modest benefit" might be a bottleneck for scalable systems. I.e., if another bottleneck is what limits such to a modest benefit, when that other bottleneck is removed such high contention updates might become the next significant bottleneck.
> And honestly, I still think most programmers would be better served by using ligh weight (in the
> uncontended case) synchronization objects provided by the OS (i.e., mutex/condition variables)
> rather than employing atomic operations directly.
From a relatively uninformed perspective (compared to others here), I think most programmers would be better served by not having an extremely detailed view of the ISA and microarchitecture. I suspect most of these matters can be abstracted by the compiler, libraries, runtime system, OS, etc. without a substantial abstraction penalty. Perhaps I am overestimating the quality of compilers (or the optimization levels typically used) et al. It seems to me that generally a programmer should seek to communicate intent more than specific implementation. Data structure and algorithm choices do communicate certain expectations because of their tradeoffs, even dipping into assembly often communicates an expectation that a small section is performance critical, but it seems that
I do think that allowing programmers to provide an implementation that is semantically equivalent to a higher-level expression would be useful. In some cases, such low-level implementations might not be provably equivalent or might seem (to the compiler) inferior, so an annotation might be needed to tell the compiler to use this implementation. However, I suspect that most programming languages have much higher priority issues.
I also like the idea of the compiler/runtime annotating even source code to communicate information both for a future compilation and for the programmers. With distributed version control systems, such might even be implementable. (I wonder if there would be a psychological benefit for the compiler inserting warnings and suggestions into the source code rather than in a standard error stream. Such might allow only critical warnings to appear in the standard error stream while encouraging action on less critical warnings when working on the appropriate section of code.)
[snip]
> Note that you will also need to return the current or the previous value when the transaction
> is successful.
Yes, at least if the operation was not "fire-and-forget". Yet that could still be lower latency and lower bandwidth than using ordinary coherence for such hot line updates.
> I'm no expert in VLSI design, but this seems like a large increase in complexity in the coherency
> protocol for a modest benefit. I find it more likely that the line in question is protected from
> eviction from the local cache for a short period of time. My understanding is that this is
> basically how modern x86 implements atomic operations on cacheable memory.
The "modest benefit" might be a bottleneck for scalable systems. I.e., if another bottleneck is what limits such to a modest benefit, when that other bottleneck is removed such high contention updates might become the next significant bottleneck.
> And honestly, I still think most programmers would be better served by using ligh weight (in the
> uncontended case) synchronization objects provided by the OS (i.e., mutex/condition variables)
> rather than employing atomic operations directly.
From a relatively uninformed perspective (compared to others here), I think most programmers would be better served by not having an extremely detailed view of the ISA and microarchitecture. I suspect most of these matters can be abstracted by the compiler, libraries, runtime system, OS, etc. without a substantial abstraction penalty. Perhaps I am overestimating the quality of compilers (or the optimization levels typically used) et al. It seems to me that generally a programmer should seek to communicate intent more than specific implementation. Data structure and algorithm choices do communicate certain expectations because of their tradeoffs, even dipping into assembly often communicates an expectation that a small section is performance critical, but it seems that
I do think that allowing programmers to provide an implementation that is semantically equivalent to a higher-level expression would be useful. In some cases, such low-level implementations might not be provably equivalent or might seem (to the compiler) inferior, so an annotation might be needed to tell the compiler to use this implementation. However, I suspect that most programming languages have much higher priority issues.
I also like the idea of the compiler/runtime annotating even source code to communicate information both for a future compilation and for the programmers. With distributed version control systems, such might even be implementable. (I wonder if there would be a psychological benefit for the compiler inserting warnings and suggestions into the source code rather than in a standard error stream. Such might allow only critical warnings to appear in the standard error stream while encouraging action on less critical warnings when working on the appropriate section of code.)