By: Michael S (already5chosen.delete@this.yahoo.com), July 15, 2015 8:29 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on July 15, 2015 8:04 am wrote:
> NoSpammer (no.delete@this.spam.com) on July 15, 2015 7:05 am wrote:
> > anon (anon.delete@this.anon.com) on July 15, 2015 2:26 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 1:19 am wrote:
> > > > anon (anon.delete@this.anon.com) on July 14, 2015 9:37 pm wrote:
> > > > > NoSpammer (no.delete@this.spam.com) on July 14, 2015 2:02 pm wrote:
> > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 13, 2015 4:50 pm wrote:
> > > > > > > NoSpammer (no.delete@this.spam.com) on July 13, 2015 12:34 pm wrote:
> > > > > > > > With x86 a volatile variable is at least up to date monotonically with respect to what you've seen
> > > > > > > > so far. That's actually good enough and it's excellent for producer-consumer type of stuff.
> > > > > > >
> > > > > > > That's not even true on x86. Loads may be lifted before stores
> > > > > > > and so you may read an incorrect or "impossible" value.
> > > > > >
> > > > > > Let me rephrase it: suppose you have a volatile location
> > > > > > you observe, it's possible that you see all changes
> > > > > > prior to a volatile change, but not yet the volatile change, it's also possible that you see all changes
> > > > > > and the volatile changed, but it's not possible to see a volatile changed but then in later code not see
> > > > > > prior changes.
> > > > >
> > > > > At it's most general, looking at system-wide state, this is untrue. Forwarding from the store
> > > > > queue means that a write can be seen (from one CPU) and then not be seen (from another).
> > > > >
> > > >
> > > > It seems, in explanation above NoSpammer was taking about situation in which all changes of interest
> > > > are done by one (producer) CPU and observed by another one (consumer) CPU. For such situation x86 ordering
> > > > guarantees are not just strong enough, but stronger than necessary.
> > >
> > > It sounded like he made a general statement about "the system". He said that the constraint is
> > > good for producer-consumer as a particular usage, but made a more general statement first.
> >
> > I think it's correct even in the general case when you observe (read) from one thread what the
> > other threads are writing. You will observe their writes in order, but of course some randomness
> > in which one will get ahead when. I was not claiming that the value is up to date system-wise,
> > only that it is at least up to date with respect to your previous reads (in the same thread).
> >
> > > Of course all this is making a number of assumptions to begin with, no non-temporal
> > > moves, no string instructions, clflush; memory is aligned, etc.
> >
> > These are not really issues is you WANT to write non-interlocked code. Also you can get around by
> > doing the right thing just before sync-point. You don't really want to use non-temporal instructions
> > and fancy flushes in these cases. As for string instructions (rep stos/rep movs) I've relied on the
> > observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> > see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> > so I'm confident this works (why would the CPU push out string writes in random order).
>
> I doubt a CPU would want to push it out in random order, but consider the problem in general with weak consistency.
> If there are two memory controllers and the string occupies three lines then the second memory controller might
> be satisfying some requests by another CPU delaying the write of the middle section of the string.
>
As long as we are talking about ordering in WB/WT regions, order of arrival to external memory does not matter. What matters is *observed* order governed by cache lines ownership. And the later, in case of x86 strings instructions, can't deviate too far from the continuous, because on x86 strings instructions are interruptible/restartable and the only saved states are direction, two pointers and counter. With such minimal state saved only continuous operation can be correctly restarted.
> NoSpammer (no.delete@this.spam.com) on July 15, 2015 7:05 am wrote:
> > anon (anon.delete@this.anon.com) on July 15, 2015 2:26 am wrote:
> > > Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 1:19 am wrote:
> > > > anon (anon.delete@this.anon.com) on July 14, 2015 9:37 pm wrote:
> > > > > NoSpammer (no.delete@this.spam.com) on July 14, 2015 2:02 pm wrote:
> > > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 13, 2015 4:50 pm wrote:
> > > > > > > NoSpammer (no.delete@this.spam.com) on July 13, 2015 12:34 pm wrote:
> > > > > > > > With x86 a volatile variable is at least up to date monotonically with respect to what you've seen
> > > > > > > > so far. That's actually good enough and it's excellent for producer-consumer type of stuff.
> > > > > > >
> > > > > > > That's not even true on x86. Loads may be lifted before stores
> > > > > > > and so you may read an incorrect or "impossible" value.
> > > > > >
> > > > > > Let me rephrase it: suppose you have a volatile location
> > > > > > you observe, it's possible that you see all changes
> > > > > > prior to a volatile change, but not yet the volatile change, it's also possible that you see all changes
> > > > > > and the volatile changed, but it's not possible to see a volatile changed but then in later code not see
> > > > > > prior changes.
> > > > >
> > > > > At it's most general, looking at system-wide state, this is untrue. Forwarding from the store
> > > > > queue means that a write can be seen (from one CPU) and then not be seen (from another).
> > > > >
> > > >
> > > > It seems, in explanation above NoSpammer was taking about situation in which all changes of interest
> > > > are done by one (producer) CPU and observed by another one (consumer) CPU. For such situation x86 ordering
> > > > guarantees are not just strong enough, but stronger than necessary.
> > >
> > > It sounded like he made a general statement about "the system". He said that the constraint is
> > > good for producer-consumer as a particular usage, but made a more general statement first.
> >
> > I think it's correct even in the general case when you observe (read) from one thread what the
> > other threads are writing. You will observe their writes in order, but of course some randomness
> > in which one will get ahead when. I was not claiming that the value is up to date system-wise,
> > only that it is at least up to date with respect to your previous reads (in the same thread).
> >
> > > Of course all this is making a number of assumptions to begin with, no non-temporal
> > > moves, no string instructions, clflush; memory is aligned, etc.
> >
> > These are not really issues is you WANT to write non-interlocked code. Also you can get around by
> > doing the right thing just before sync-point. You don't really want to use non-temporal instructions
> > and fancy flushes in these cases. As for string instructions (rep stos/rep movs) I've relied on the
> > observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> > see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> > so I'm confident this works (why would the CPU push out string writes in random order).
>
> I doubt a CPU would want to push it out in random order, but consider the problem in general with weak consistency.
> If there are two memory controllers and the string occupies three lines then the second memory controller might
> be satisfying some requests by another CPU delaying the write of the middle section of the string.
>
As long as we are talking about ordering in WB/WT regions, order of arrival to external memory does not matter. What matters is *observed* order governed by cache lines ownership. And the later, in case of x86 strings instructions, can't deviate too far from the continuous, because on x86 strings instructions are interruptible/restartable and the only saved states are direction, two pointers and counter. With such minimal state saved only continuous operation can be correctly restarted.