By: rwessel (robertwessel.delete@this.yahoo.com), July 16, 2015 1:00 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 16, 2015 9:33 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 8:29 am wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 15, 2015 8:04 am wrote:
> > > NoSpammer (no.delete@this.spam.com) on July 15, 2015 7:05 am wrote:
> [snip]
> > > > As for string instructions (rep stos/rep movs) I've relied on the
> > > > observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> > > > see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> > > > so I'm confident this works (why would the CPU push out string writes in random order).
> > >
> > > I doubt a CPU would want to push it out in random order, but
> > > consider the problem in general with weak consistency.
> > > If there are two memory controllers and the string occupies
> > > three lines then the second memory controller might
> > > be satisfying some requests by another CPU delaying the write of the middle section of the string.
> > >
> >
> > As long as we are talking about ordering in WB/WT regions, order of arrival to external memory does
> > not matter. What matters is *observed* order governed by cache lines ownership. And the later, in
> > case of x86 strings instructions, can't deviate too far from the continuous, because on x86 strings
> > instructions are interruptible/restartable and the only saved states are direction, two pointers and
> > counter. With such minimal state saved only continuous operation can be correctly restarted.
>
> Does the state need to be architectural? For example, is there a guarantee that the count on interrupt
> indicates the stopping point? If microarchitectural storage could be used to track what has been
> done (and perhaps even offloading copying to a DMA engine on interrupt/thread switch), perhaps using
> -1 as a magic value to check for a partial operation in this microarchitectural storage (the overhead
> of such a check would be tiny for an actual maximum-sized operation and might be acceptable to allow
> faster progress in the operation in the common case of no interruption).
As defined, the state of the registers after an interrupted x86 string instructions do contain what's needed to restart the instruction, and are the values you'd expect (updated addresses and length based on the number of items already moved).
> Michael S (already5chosen.delete@this.yahoo.com) on July 15, 2015 8:29 am wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 15, 2015 8:04 am wrote:
> > > NoSpammer (no.delete@this.spam.com) on July 15, 2015 7:05 am wrote:
> [snip]
> > > > As for string instructions (rep stos/rep movs) I've relied on the
> > > > observation that if you see the first bytes correctly and the last bytes correctly, you also probably
> > > > see everything inbetween. Running CRC checks on such messages for about 3 years has shown no errors
> > > > so I'm confident this works (why would the CPU push out string writes in random order).
> > >
> > > I doubt a CPU would want to push it out in random order, but
> > > consider the problem in general with weak consistency.
> > > If there are two memory controllers and the string occupies
> > > three lines then the second memory controller might
> > > be satisfying some requests by another CPU delaying the write of the middle section of the string.
> > >
> >
> > As long as we are talking about ordering in WB/WT regions, order of arrival to external memory does
> > not matter. What matters is *observed* order governed by cache lines ownership. And the later, in
> > case of x86 strings instructions, can't deviate too far from the continuous, because on x86 strings
> > instructions are interruptible/restartable and the only saved states are direction, two pointers and
> > counter. With such minimal state saved only continuous operation can be correctly restarted.
>
> Does the state need to be architectural? For example, is there a guarantee that the count on interrupt
> indicates the stopping point? If microarchitectural storage could be used to track what has been
> done (and perhaps even offloading copying to a DMA engine on interrupt/thread switch), perhaps using
> -1 as a magic value to check for a partial operation in this microarchitectural storage (the overhead
> of such a check would be tiny for an actual maximum-sized operation and might be acceptable to allow
> faster progress in the operation in the common case of no interruption).
As defined, the state of the registers after an interrupted x86 string instructions do contain what's needed to restart the instruction, and are the values you'd expect (updated addresses and length based on the number of items already moved).