By: nksingh (none.delete@this.none.non), August 26, 2014 11:54 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 26, 2014 11:02 pm wrote:
>
> I'm not sure if I understand you correctly. I think I failed to explain myself properly: the CAS
> problem I heard of is not due to the instruction itself failing to make forward progress, or the
> entire system failing to make forward progress, but due to the software constructs that use it,
> failing to be able to achieve a livelock-free sequence of instructions on a per-thread basis.
>
> Let's take the simple case of an atomic increment. A CAS architecture will have to do something like:
>
> retry:
> LD r1 := [mem]
> ADD r1 := r1 + 1
> CAS [mem], r1
> CMP failure case
> BEQ retry
>
> Now unless the ISA or implementation also defines very specific sequences leading up to the CAS instruction
> as being "special", with respect to forward progress of the thread, then it's possible that other CPUs
> will always modify the location between the LD and the CAS, and thus this thread always fails.
>
> LL/SC can solve that problem in hardware. A CAS architecture could too, but it would
> either need some fancy detection/prediction and treatment of LD that comes ahead of
> the CAS, or it needs an LD variant which is to be used before the CAS, with particular
> limited instructions between (in which case you're basically back to LL/SC).
>
On x86/x64 there's the prefetchw instruction to signal write intent for the first load. AMD has had it for a long time, and I think Intel started taking the hint recently as well (it's now in their programmers manual).
>
> I'm not sure if I understand you correctly. I think I failed to explain myself properly: the CAS
> problem I heard of is not due to the instruction itself failing to make forward progress, or the
> entire system failing to make forward progress, but due to the software constructs that use it,
> failing to be able to achieve a livelock-free sequence of instructions on a per-thread basis.
>
> Let's take the simple case of an atomic increment. A CAS architecture will have to do something like:
>
> retry:
> LD r1 := [mem]
> ADD r1 := r1 + 1
> CAS [mem], r1
> CMP failure case
> BEQ retry
>
> Now unless the ISA or implementation also defines very specific sequences leading up to the CAS instruction
> as being "special", with respect to forward progress of the thread, then it's possible that other CPUs
> will always modify the location between the LD and the CAS, and thus this thread always fails.
>
> LL/SC can solve that problem in hardware. A CAS architecture could too, but it would
> either need some fancy detection/prediction and treatment of LD that comes ahead of
> the CAS, or it needs an LD variant which is to be used before the CAS, with particular
> limited instructions between (in which case you're basically back to LL/SC).
>
On x86/x64 there's the prefetchw instruction to signal write intent for the first load. AMD has had it for a long time, and I think Intel started taking the hint recently as well (it's now in their programmers manual).