By: dmcq (dmcq.delete@this.fano.co.uk), July 15, 2015 8:04 am
Room: Moderated Discussions
NoSpammer (no.delete@this.spam.com) on July 15, 2015 7:05 am wrote:
> anon (anon.delete@this.anon.com) on July 15, 2015 2:26 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 1:19 am wrote:
> > > anon (anon.delete@this.anon.com) on July 14, 2015 9:37 pm wrote:
> > > > NoSpammer (no.delete@this.spam.com) on July 14, 2015 2:02 pm wrote:
> > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 13, 2015 4:50 pm wrote:
> > > > > > NoSpammer (no.delete@this.spam.com) on July 13, 2015 12:34 pm wrote:
> > > > > > > With x86 a volatile variable is at least up to date monotonically with respect to what you've seen
> > > > > > > so far. That's actually good enough and it's excellent for producer-consumer type of stuff.
> > > > > >
> > > > > > That's not even true on x86. Loads may be lifted before stores
> > > > > > and so you may read an incorrect or "impossible" value.
> > > > >
> > > > > Let me rephrase it: suppose you have a volatile location
> > > > > you observe, it's possible that you see all changes
> > > > > prior to a volatile change, but not yet the volatile change, it's also possible that you see all changes
> > > > > and the volatile changed, but it's not possible to see a volatile changed but then in later code not see
> > > > > prior changes.
> > > >
> > > > At it's most general, looking at system-wide state, this is untrue. Forwarding from the store
> > > > queue means that a write can be seen (from one CPU) and then not be seen (from another).
> > > >
> > >
> > > It seems, in explanation above NoSpammer was taking about situation in which all changes of interest
> > > are done by one (producer) CPU and observed by another one (consumer) CPU. For such situation x86 ordering
> > > guarantees are not just strong enough, but stronger than necessary.
> >
> > It sounded like he made a general statement about "the system". He said that the constraint is
> > good for producer-consumer as a particular usage, but made a more general statement first.
>
> I think it's correct even in the general case when you observe (read) from one thread what the
> other threads are writing. You will observe their writes in order, but of course some randomness
> in which one will get ahead when. I was not claiming that the value is up to date system-wise,
> only that it is at least up to date with respect to your previous reads (in the same thread).
>
> > Of course all this is making a number of assumptions to begin with, no non-temporal
> > moves, no string instructions, clflush; memory is aligned, etc.
>
> These are not really issues is you WANT to write non-interlocked code. Also you can get around by
> doing the right thing just before sync-point. You don't really want to use non-temporal instructions
> and fancy flushes in these cases. As for string instructions (rep stos/rep movs) I've relied on the
> observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> so I'm confident this works (why would the CPU push out string writes in random order).
I doubt a CPU would want to push it out in random order, but consider the problem in general with weak consistency. If there are two memory controllers and the string occupies three lines then the second memory controller might be satisfying some requests by another CPU delaying the write of the middle section of the string.
> anon (anon.delete@this.anon.com) on July 15, 2015 2:26 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 1:19 am wrote:
> > > anon (anon.delete@this.anon.com) on July 14, 2015 9:37 pm wrote:
> > > > NoSpammer (no.delete@this.spam.com) on July 14, 2015 2:02 pm wrote:
> > > > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 13, 2015 4:50 pm wrote:
> > > > > > NoSpammer (no.delete@this.spam.com) on July 13, 2015 12:34 pm wrote:
> > > > > > > With x86 a volatile variable is at least up to date monotonically with respect to what you've seen
> > > > > > > so far. That's actually good enough and it's excellent for producer-consumer type of stuff.
> > > > > >
> > > > > > That's not even true on x86. Loads may be lifted before stores
> > > > > > and so you may read an incorrect or "impossible" value.
> > > > >
> > > > > Let me rephrase it: suppose you have a volatile location
> > > > > you observe, it's possible that you see all changes
> > > > > prior to a volatile change, but not yet the volatile change, it's also possible that you see all changes
> > > > > and the volatile changed, but it's not possible to see a volatile changed but then in later code not see
> > > > > prior changes.
> > > >
> > > > At it's most general, looking at system-wide state, this is untrue. Forwarding from the store
> > > > queue means that a write can be seen (from one CPU) and then not be seen (from another).
> > > >
> > >
> > > It seems, in explanation above NoSpammer was taking about situation in which all changes of interest
> > > are done by one (producer) CPU and observed by another one (consumer) CPU. For such situation x86 ordering
> > > guarantees are not just strong enough, but stronger than necessary.
> >
> > It sounded like he made a general statement about "the system". He said that the constraint is
> > good for producer-consumer as a particular usage, but made a more general statement first.
>
> I think it's correct even in the general case when you observe (read) from one thread what the
> other threads are writing. You will observe their writes in order, but of course some randomness
> in which one will get ahead when. I was not claiming that the value is up to date system-wise,
> only that it is at least up to date with respect to your previous reads (in the same thread).
>
> > Of course all this is making a number of assumptions to begin with, no non-temporal
> > moves, no string instructions, clflush; memory is aligned, etc.
>
> These are not really issues is you WANT to write non-interlocked code. Also you can get around by
> doing the right thing just before sync-point. You don't really want to use non-temporal instructions
> and fancy flushes in these cases. As for string instructions (rep stos/rep movs) I've relied on the
> observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> so I'm confident this works (why would the CPU push out string writes in random order).
I doubt a CPU would want to push it out in random order, but consider the problem in general with weak consistency. If there are two memory controllers and the string occupies three lines then the second memory controller might be satisfying some requests by another CPU delaying the write of the middle section of the string.