By: dmcq (dmcq.delete@this.fano.co.uk), July 13, 2015 7:27 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on July 12, 2015 3:42 am wrote:
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on July 10, 2015 1:07 pm wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 9, 2015 4:24 pm wrote:
>
> > On the other hand, the hardware doesn't actually do "memory barrier" operations directly; it passes
> > messages around in the cache coherency protocol (MOESI or similar).
>
> This is not true. Depending on the interconnect and cache coherency rules,
> a significant amount / most reordering is done by inside the CPU core.
>
> Significantly, loads pass other memory operations due to out of order execution or non blocking loads
> (load/load reordering can be hidden from ISA by speculation in complex cores, but apparently not load/store).
> And stores like to pass other stores before reaching the cache coherency layer (in case your newer
> store does not have cacheline exclusive), and to a lesser extent with blocked loads.
>
> So all of that happens inside the core. Memory ordering instructions
> have to prevent these reorderings within the core.
>
> > If you really want to make the
> > hardware's life simple (so that it can really scream), you'd surely push back and make software decide
> > exactly which cache coherency messages it wants to send and when it forces write back from cache
> > to RAM - indeed, in some senses, a Cell SPU forced the developers to do exactly that.
> >
> > The fact that we prefer to hide that simple approach underneath memory barriers and hardware
> > cache controls suggests that we'd prefer to constrain the hardware to make the programming
> > model easier to grasp - look at how hard developers found it to exploit the Cell SPUs.
I've nothing against coherence, just the business of saying that every single memory operation has acquire or release semantics rather than using special operations for when the user requires coherence. We most definitely do need some sort of coherence or otherwise for instance transaction processing can't be supported. Transaction processing doesn't require that every single operation within a transaction have its own individual secure or commit though.
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on July 10, 2015 1:07 pm wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 9, 2015 4:24 pm wrote:
>
> > On the other hand, the hardware doesn't actually do "memory barrier" operations directly; it passes
> > messages around in the cache coherency protocol (MOESI or similar).
>
> This is not true. Depending on the interconnect and cache coherency rules,
> a significant amount / most reordering is done by inside the CPU core.
>
> Significantly, loads pass other memory operations due to out of order execution or non blocking loads
> (load/load reordering can be hidden from ISA by speculation in complex cores, but apparently not load/store).
> And stores like to pass other stores before reaching the cache coherency layer (in case your newer
> store does not have cacheline exclusive), and to a lesser extent with blocked loads.
>
> So all of that happens inside the core. Memory ordering instructions
> have to prevent these reorderings within the core.
>
> > If you really want to make the
> > hardware's life simple (so that it can really scream), you'd surely push back and make software decide
> > exactly which cache coherency messages it wants to send and when it forces write back from cache
> > to RAM - indeed, in some senses, a Cell SPU forced the developers to do exactly that.
> >
> > The fact that we prefer to hide that simple approach underneath memory barriers and hardware
> > cache controls suggests that we'd prefer to constrain the hardware to make the programming
> > model easier to grasp - look at how hard developers found it to exploit the Cell SPUs.
I've nothing against coherence, just the business of saying that every single memory operation has acquire or release semantics rather than using special operations for when the user requires coherence. We most definitely do need some sort of coherence or otherwise for instance transaction processing can't be supported. Transaction processing doesn't require that every single operation within a transaction have its own individual secure or commit though.