By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), July 21, 2015 12:46 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on July 21, 2015 8:17 am wrote:
> That would still be interesting to know of. I do recall a long ago conversation with an IBM guy who worked
> on POWER implementation talking about implementation of one of their barriers (sync, or perhaps eieio)
> being improved such that it no longer had to go to fabric, somewhere around POWER4 to POWER5 transition.
POWER4 does stores in strict program order to the coherency point so lwsync instructions are effectively no-ops when preceding a store (all stores are store-release already) and must prevent loads from being reordered ahead of a preceding load to make it a load-acquire. Maybe is you were thinking about it? I thought about it because lwsync was introduced with the POWER4 possibly to leverage this implementation detail which makes it a vastly cheaper instruction than the heavyweight sync/isync and definitely doesn't need to do anything outside of a core.
> That would still be interesting to know of. I do recall a long ago conversation with an IBM guy who worked
> on POWER implementation talking about implementation of one of their barriers (sync, or perhaps eieio)
> being improved such that it no longer had to go to fabric, somewhere around POWER4 to POWER5 transition.
POWER4 does stores in strict program order to the coherency point so lwsync instructions are effectively no-ops when preceding a store (all stores are store-release already) and must prevent loads from being reordered ahead of a preceding load to make it a load-acquire. Maybe is you were thinking about it? I thought about it because lwsync was introduced with the POWER4 possibly to leverage this implementation detail which makes it a vastly cheaper instruction than the heavyweight sync/isync and definitely doesn't need to do anything outside of a core.