By: Aaron Spink (aaronspink.delete@this.notearthlink.net), August 26, 2014 4:41 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on August 25, 2014 11:32 pm wrote:
> Randomized exponential backoff is the easy one that just about everybody uses.
> Note that exponential backoff alone (without a separate randomized starting
> delay for each thread) doesn't provably resolve livelock, though.
>
And unfortunately this can actually have a pretty significant impact on performance as well.
One of the reasons that I'm currently not that big of a fan of LL/SC, is that it is basically used ONLY for analogs of CMPXCHG, et al. But even though it is only used as an analog for the basic primitives, its design is such that it severely limited the optimization capabilities because it is basically unbounded.
For an ISA definition, I would much rather have the hard primitives like CMPXCHG such that you have the potential for much more optimization. For something like CMPXCHG, you can actually export the operation into the coherence infrastructure and aren't necessarily required to do it at the core of invocation. This can allow for some much more efficient coherency flows in the presence of high contention. Being able to handle the interlock for hot lines outside the core can have a significant impact on performance. Its a bit of flexibility that will be nice to have as hardware contexts continue to climb ever higher.
Many others have similar thoughts, for instance, while RISC-V has LL/SC, it also has separate primitives for things like Fetch_and_ADD with one of the main ideas being that you can export the operation outside of the core.
Basically I look at LL/SC these days as a rather poor and broken implementation of transactional memory, with many of the downsides and none of the advantages.
> Randomized exponential backoff is the easy one that just about everybody uses.
> Note that exponential backoff alone (without a separate randomized starting
> delay for each thread) doesn't provably resolve livelock, though.
>
And unfortunately this can actually have a pretty significant impact on performance as well.
One of the reasons that I'm currently not that big of a fan of LL/SC, is that it is basically used ONLY for analogs of CMPXCHG, et al. But even though it is only used as an analog for the basic primitives, its design is such that it severely limited the optimization capabilities because it is basically unbounded.
For an ISA definition, I would much rather have the hard primitives like CMPXCHG such that you have the potential for much more optimization. For something like CMPXCHG, you can actually export the operation into the coherence infrastructure and aren't necessarily required to do it at the core of invocation. This can allow for some much more efficient coherency flows in the presence of high contention. Being able to handle the interlock for hot lines outside the core can have a significant impact on performance. Its a bit of flexibility that will be nice to have as hardware contexts continue to climb ever higher.
Many others have similar thoughts, for instance, while RISC-V has LL/SC, it also has separate primitives for things like Fetch_and_ADD with one of the main ideas being that you can export the operation outside of the core.
Basically I look at LL/SC these days as a rather poor and broken implementation of transactional memory, with many of the downsides and none of the advantages.