By: dmcq (dmcq.delete@this.fano.co.uk), July 24, 2015 1:08 am
Room: Moderated Discussions
Konrad Schwarz (no.spam.delete@this.no.spam) on July 23, 2015 11:44 pm wrote:
> Konrad Schwarz (no.spam.delete@this.no.spam) on July 23, 2015 3:38 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on July 23, 2015 7:31 am wrote:
> > > I don't think you are correct about it.
> >
> > Ok, so I looked at the ARM ACE specification, ARM IHI 0022D, which also specifies AXI4.
> > ACE is a coherency protocol defined by ARM. Chapter C8 is on barrier transactions.
> >
> > I can't say that I have fully digested the specification, but it does look like signaling for barriers
> > is from masters (e.g., processor cores) to downstream interconnects only (including a handshake). To
> > me, it looks like the set of memory accesses (transactions) affected by a (full) synchronization barrier
> > is defined implicitly as the memory accesses that happen to have been issued by processors ahead of
> > or after the barrier. Interconnects must not reorder these accesses across the barrier.
> >
> > In other words, any store queues in other processors would not be affected by a processor
> > issuing a full synchronization barrier, contrary to my previous assertion.
> >
> > The interconnect does delay completing some access transactions in certain cases.
>
> I also suspect that the mental model used above is somewhat incorrect: a cached line of data
> always has the same value in each cache it is in, by virtue of the cache coherency protocol.
>
> The worrisome data is the data that is not in the cache, either because it was evicted, or it
> comes from an uncacheable region of memory.
>
> The interconnects represent the "store queues" alluded to above; with the interconnect
> connected to each master in the shareability domain. A full synchronization means that the
> interconnect must order the transactions queued to every one of its slave ports.
I don't see how they are supposed to support a data dependency barrier without processors always ordering their writes and putting a barrier between each. It is a very strong restriction where the software gives no signal it is required and it is not really needed yet ARM and POWER say they support it.
> Konrad Schwarz (no.spam.delete@this.no.spam) on July 23, 2015 3:38 pm wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on July 23, 2015 7:31 am wrote:
> > > I don't think you are correct about it.
> >
> > Ok, so I looked at the ARM ACE specification, ARM IHI 0022D, which also specifies AXI4.
> > ACE is a coherency protocol defined by ARM. Chapter C8 is on barrier transactions.
> >
> > I can't say that I have fully digested the specification, but it does look like signaling for barriers
> > is from masters (e.g., processor cores) to downstream interconnects only (including a handshake). To
> > me, it looks like the set of memory accesses (transactions) affected by a (full) synchronization barrier
> > is defined implicitly as the memory accesses that happen to have been issued by processors ahead of
> > or after the barrier. Interconnects must not reorder these accesses across the barrier.
> >
> > In other words, any store queues in other processors would not be affected by a processor
> > issuing a full synchronization barrier, contrary to my previous assertion.
> >
> > The interconnect does delay completing some access transactions in certain cases.
>
> I also suspect that the mental model used above is somewhat incorrect: a cached line of data
> always has the same value in each cache it is in, by virtue of the cache coherency protocol.
>
> The worrisome data is the data that is not in the cache, either because it was evicted, or it
> comes from an uncacheable region of memory.
>
> The interconnects represent the "store queues" alluded to above; with the interconnect
> connected to each master in the shareability domain. A full synchronization means that the
> interconnect must order the transactions queued to every one of its slave ports.
I don't see how they are supposed to support a data dependency barrier without processors always ordering their writes and putting a barrier between each. It is a very strong restriction where the software gives no signal it is required and it is not really needed yet ARM and POWER say they support it.