By: Brett (ggtgp.delete@this.yahoo.com), September 19, 2021 1:06 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on September 17, 2021 8:59 am wrote:
> Doug S (foo.delete@this.bar.bar) on September 16, 2021 10:57 pm wrote:
> >
> > It is easy to declare instructions for memcpy(). The devil is in the details of
> > the implementation. How long did it take Intel to get it even halfway right?
>
> I think the ARM instructions look very reasonable, and are probably
> not too bad to get right with just a few test cases.
>
> Famous last words.
>
> The memset case is most definitely the easier of the two, since it doesn't have any issues with two different
> pointer alignments. And splitting it up into three separate instructions ("initial align, loop over bulk,
> final partial case") makes a lot of sense. It really should be fairly hard to screw up too badly.
>
> memcpy is more complicated, but it really shouldn't be horrible either.
>
> The most complicated case I see does come from the "three instructions" thing: making
> sure it's doing everything properly if they done individually. And "done individually"
> happens for the "restart for exceptions or interrupts" case, even if the instructions
> are right next to each other in the right order in the instruction stream.
>
> That's particularly true for the memcpy case, because it would be conceptually sensible to always https://connect.linaro.org/resources/lvc21f/lvc21f-113/
> do that first instruction (even for the "destination is already aligned" case) just to start the "have
> previous source buffer ready for shifting with the next one" for the mutually unaligned case.
>
> Maybe when you take a trap on the middle instruction, the saved state will point to the first
> instruction so that you always restart there (so that it looks like one atomic sequence)?
>
> IOW, the "three instruction" model really makes sense when you flow from one state to the
> next, but it also adds its own excitement for the "(re)start in the middle of the sequence"
> case. As per above, I think you can make that case an invalid situation, though.
Presentation and video of the new instructions:
https://connect.linaro.org/resources/lvc21f/lvc21f-113/
The Exclamation means the register updates, and I think F means Forward.
My guess is the middle instruction does vector aligned copies, but that can change per CPU design, so you always need all three?
> And yes, I like "rep stos/movs" too, but I've also talked here about at least part of what makes
> that a "good, but not perfect" interface. It has a lot of good things going for it (that whole
> interruptibility is quite natural), but it does have some real complicating issues too from its
> historical semantics (uncached and overlapping range semantics are the two big ones, I feel).
>
> So the intel implementation has to jump through some hoops due
> to compatibility concerns, that the ARM model doesn't need to.
>
> But it's going to be some time before we see any implementation
> of the ARM thing, so I guess we'll have to wait and see.
>
> I'm obviously happy to see this, and it looks sane to me. But you're right, implementations
> aren't here yet, and maybe it won't look as rosy in a few years.
>
> Linus
> Doug S (foo.delete@this.bar.bar) on September 16, 2021 10:57 pm wrote:
> >
> > It is easy to declare instructions for memcpy(). The devil is in the details of
> > the implementation. How long did it take Intel to get it even halfway right?
>
> I think the ARM instructions look very reasonable, and are probably
> not too bad to get right with just a few test cases.
>
> Famous last words.
>
> The memset case is most definitely the easier of the two, since it doesn't have any issues with two different
> pointer alignments. And splitting it up into three separate instructions ("initial align, loop over bulk,
> final partial case") makes a lot of sense. It really should be fairly hard to screw up too badly.
>
> memcpy is more complicated, but it really shouldn't be horrible either.
>
> The most complicated case I see does come from the "three instructions" thing: making
> sure it's doing everything properly if they done individually. And "done individually"
> happens for the "restart for exceptions or interrupts" case, even if the instructions
> are right next to each other in the right order in the instruction stream.
>
> That's particularly true for the memcpy case, because it would be conceptually sensible to always https://connect.linaro.org/resources/lvc21f/lvc21f-113/
> do that first instruction (even for the "destination is already aligned" case) just to start the "have
> previous source buffer ready for shifting with the next one" for the mutually unaligned case.
>
> Maybe when you take a trap on the middle instruction, the saved state will point to the first
> instruction so that you always restart there (so that it looks like one atomic sequence)?
>
> IOW, the "three instruction" model really makes sense when you flow from one state to the
> next, but it also adds its own excitement for the "(re)start in the middle of the sequence"
> case. As per above, I think you can make that case an invalid situation, though.
Presentation and video of the new instructions:
https://connect.linaro.org/resources/lvc21f/lvc21f-113/
The Exclamation means the register updates, and I think F means Forward.
My guess is the middle instruction does vector aligned copies, but that can change per CPU design, so you always need all three?
> And yes, I like "rep stos/movs" too, but I've also talked here about at least part of what makes
> that a "good, but not perfect" interface. It has a lot of good things going for it (that whole
> interruptibility is quite natural), but it does have some real complicating issues too from its
> historical semantics (uncached and overlapping range semantics are the two big ones, I feel).
>
> So the intel implementation has to jump through some hoops due
> to compatibility concerns, that the ARM model doesn't need to.
>
> But it's going to be some time before we see any implementation
> of the ARM thing, so I guess we'll have to wait and see.
>
> I'm obviously happy to see this, and it looks sane to me. But you're right, implementations
> aren't here yet, and maybe it won't look as rosy in a few years.
>
> Linus