By: rwessel (robertwessel.delete@this.yahoo.com), December 4, 2014 7:50 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on December 4, 2014 3:34 pm wrote:
> Konrad Schwarz (no.spam.delete@this.no.spam) on December 4, 2014 1:10 pm wrote:
> [snip]
>
> > Note that you will also need to return the current or the previous value when the transaction
> > is successful.
>
> Yes, at least if the operation was not "fire-and-forget". Yet that could still be lower
> latency and lower bandwidth than using ordinary coherence for such hot line updates.
>
> > I'm no expert in VLSI design, but this seems like a large increase in complexity in the coherency
> > protocol for a modest benefit. I find it more likely that the line in question is protected from
> > eviction from the local cache for a short period of time. My understanding is that this is
> > basically how modern x86 implements atomic operations on cacheable memory.
>
> The "modest benefit" might be a bottleneck for scalable systems. I.e., if another
> bottleneck is what limits such to a modest benefit, when that other bottleneck is
> removed such high contention updates might become the next significant bottleneck.
>
> > And honestly, I still think most programmers would be better served by using ligh weight (in the
> > uncontended case) synchronization objects provided by the OS (i.e., mutex/condition variables)
> > rather than employing atomic operations directly.
>
> From a relatively uninformed perspective (compared to others here), I think most programmers would
> be better served by not having an extremely detailed view of the ISA and microarchitecture. I suspect
> most of these matters can be abstracted by the compiler, libraries, runtime system, OS, etc. without
> a substantial abstraction penalty. Perhaps I am overestimating the quality of compilers (or the optimization
> levels typically used) et al.
See Proebsting’s Law (and *I* think he was being optimistic!). Despite all the hoopla, compilers mostly suck at optimization. About the only thing that does a worse job of optimization is the average programmer.
Sure, we've all seen impressive cases of optimization - heck I've been impressed by many of the AVX vectorizations that ICC can do. But again and again those turn out to be terribly fragile, and just don't make much difference on the vast majority of code.
>It seems to me that generally a programmer should seek to communicate
> intent more than specific implementation. Data structure and algorithm choices do communicate certain
> expectations because of their tradeoffs, even dipping into assembly often communicates an expectation
> that a small section is performance critical, but it seems that
>
> I do think that allowing programmers to provide an implementation that is semantically equivalent to a higher-level
> expression would be useful. In some cases, such low-level implementations might not be provably equivalent or
> might seem (to the compiler) inferior, so an annotation might be needed to tell the compiler to use this implementation.
> However, I suspect that most programming languages have much higher priority issues.
Higher level expression of programs is a good idea. Good enough that a ten fold reduction in performance is absolutely worth it in 99% of cases. And much of the computing world is built on far worse performance than that. Just consider the vast use of scripting languages in web servers.
> Konrad Schwarz (no.spam.delete@this.no.spam) on December 4, 2014 1:10 pm wrote:
> [snip]
>
> > Note that you will also need to return the current or the previous value when the transaction
> > is successful.
>
> Yes, at least if the operation was not "fire-and-forget". Yet that could still be lower
> latency and lower bandwidth than using ordinary coherence for such hot line updates.
>
> > I'm no expert in VLSI design, but this seems like a large increase in complexity in the coherency
> > protocol for a modest benefit. I find it more likely that the line in question is protected from
> > eviction from the local cache for a short period of time. My understanding is that this is
> > basically how modern x86 implements atomic operations on cacheable memory.
>
> The "modest benefit" might be a bottleneck for scalable systems. I.e., if another
> bottleneck is what limits such to a modest benefit, when that other bottleneck is
> removed such high contention updates might become the next significant bottleneck.
>
> > And honestly, I still think most programmers would be better served by using ligh weight (in the
> > uncontended case) synchronization objects provided by the OS (i.e., mutex/condition variables)
> > rather than employing atomic operations directly.
>
> From a relatively uninformed perspective (compared to others here), I think most programmers would
> be better served by not having an extremely detailed view of the ISA and microarchitecture. I suspect
> most of these matters can be abstracted by the compiler, libraries, runtime system, OS, etc. without
> a substantial abstraction penalty. Perhaps I am overestimating the quality of compilers (or the optimization
> levels typically used) et al.
See Proebsting’s Law (and *I* think he was being optimistic!). Despite all the hoopla, compilers mostly suck at optimization. About the only thing that does a worse job of optimization is the average programmer.
Sure, we've all seen impressive cases of optimization - heck I've been impressed by many of the AVX vectorizations that ICC can do. But again and again those turn out to be terribly fragile, and just don't make much difference on the vast majority of code.
>It seems to me that generally a programmer should seek to communicate
> intent more than specific implementation. Data structure and algorithm choices do communicate certain
> expectations because of their tradeoffs, even dipping into assembly often communicates an expectation
> that a small section is performance critical, but it seems that
>
> I do think that allowing programmers to provide an implementation that is semantically equivalent to a higher-level
> expression would be useful. In some cases, such low-level implementations might not be provably equivalent or
> might seem (to the compiler) inferior, so an annotation might be needed to tell the compiler to use this implementation.
> However, I suspect that most programming languages have much higher priority issues.
Higher level expression of programs is a good idea. Good enough that a ten fold reduction in performance is absolutely worth it in 99% of cases. And much of the computing world is built on far worse performance than that. Just consider the vast use of scripting languages in web servers.