By: Aaron Spink (aaronspink.delete@this.notearthlink.net), August 26, 2014 5:27 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 26, 2014 6:06 am wrote:
> I had the idea from somewhere that LL/SC in POWER CPUs had similar kinds of hardware guarantees
> when used in very specific, limited sequences. That is, the hardware can take and hold the line
> to avoid livelocks, will avoid state transitions, etc. I don't have a reference off the top of
> my head (or the powerpc ISA manual handy to see what it says), so I could be wrong.
>
Basically all LL/SC architectures have put severe limits on the usage of LL/SC to get them to actually work in practice. Lots of things like no stores between LL/SC, limited number of instruction, limited number of loads, etcx.
> In fact, in other architectures (e.g., SPARC), CAS I think has been a problem in the past with livelocks,
> because of the common need to load the source data before the CAS. The advantage there of LL/SC
> is that the LL could signal the core to load-exclusive and prepare for SC, etc. whereas LD/CAS may
> be more difficult to optimize that first load and squash the livelocks in hardware.
>
This is likely the case of the double values load/swap. AKA, compare on X and swap Y. Most CAS architectures have simply added double load CAS which handles this. Load X and Y from same 32 or 64 byte aligned space and swap one of them.
None of the intrinsics handle multiline CAS like operations well, neither CAS nor LL/SC. Any time you get into multiple coherency transaction inter-context transactions you run into a lot of issues.
> I had the idea from somewhere that LL/SC in POWER CPUs had similar kinds of hardware guarantees
> when used in very specific, limited sequences. That is, the hardware can take and hold the line
> to avoid livelocks, will avoid state transitions, etc. I don't have a reference off the top of
> my head (or the powerpc ISA manual handy to see what it says), so I could be wrong.
>
Basically all LL/SC architectures have put severe limits on the usage of LL/SC to get them to actually work in practice. Lots of things like no stores between LL/SC, limited number of instruction, limited number of loads, etcx.
> In fact, in other architectures (e.g., SPARC), CAS I think has been a problem in the past with livelocks,
> because of the common need to load the source data before the CAS. The advantage there of LL/SC
> is that the LL could signal the core to load-exclusive and prepare for SC, etc. whereas LD/CAS may
> be more difficult to optimize that first load and squash the livelocks in hardware.
>
This is likely the case of the double values load/swap. AKA, compare on X and swap Y. Most CAS architectures have simply added double load CAS which handles this. Load X and Y from same 32 or 64 byte aligned space and swap one of them.
None of the intrinsics handle multiline CAS like operations well, neither CAS nor LL/SC. Any time you get into multiple coherency transaction inter-context transactions you run into a lot of issues.