By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), September 17, 2021 8:59 am
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on September 16, 2021 10:57 pm wrote:
>
> It is easy to declare instructions for memcpy(). The devil is in the details of
> the implementation. How long did it take Intel to get it even halfway right?
I think the ARM instructions look very reasonable, and are probably not too bad to get right with just a few test cases.
Famous last words.
The memset case is most definitely the easier of the two, since it doesn't have any issues with two different pointer alignments. And splitting it up into three separate instructions ("initial align, loop over bulk, final partial case") makes a lot of sense. It really should be fairly hard to screw up too badly.
memcpy is more complicated, but it really shouldn't be horrible either.
The most complicated case I see does come from the "three instructions" thing: making sure it's doing everything properly if they done individually. And "done individually" happens for the "restart for exceptions or interrupts" case, even if the instructions are right next to each other in the right order in the instruction stream.
That's particularly true for the memcpy case, because it would be conceptually sensible to always do that first instruction (even for the "destination is already aligned" case) just to start the "have previous source buffer ready for shifting with the next one" for the mutually unaligned case.
Maybe when you take a trap on the middle instruction, the saved state will point to the first instruction so that you always restart there (so that it looks like one atomic sequence)?
IOW, the "three instruction" model really makes sense when you flow from one state to the next, but it also adds its own excitement for the "(re)start in the middle of the sequence" case. As per above, I think you can make that case an invalid situation, though.
And yes, I like "rep stos/movs" too, but I've also talked here about at least part of what makes that a "good, but not perfect" interface. It has a lot of good things going for it (that whole interruptibility is quite natural), but it does have some real complicating issues too from its historical semantics (uncached and overlapping range semantics are the two big ones, I feel).
So the intel implementation has to jump through some hoops due to compatibility concerns, that the ARM model doesn't need to.
But it's going to be some time before we see any implementation of the ARM thing, so I guess we'll have to wait and see.
I'm obviously happy to see this, and it looks sane to me. But you're right, implementations aren't here yet, and maybe it won't look as rosy in a few years.
Linus
>
> It is easy to declare instructions for memcpy(). The devil is in the details of
> the implementation. How long did it take Intel to get it even halfway right?
I think the ARM instructions look very reasonable, and are probably not too bad to get right with just a few test cases.
Famous last words.
The memset case is most definitely the easier of the two, since it doesn't have any issues with two different pointer alignments. And splitting it up into three separate instructions ("initial align, loop over bulk, final partial case") makes a lot of sense. It really should be fairly hard to screw up too badly.
memcpy is more complicated, but it really shouldn't be horrible either.
The most complicated case I see does come from the "three instructions" thing: making sure it's doing everything properly if they done individually. And "done individually" happens for the "restart for exceptions or interrupts" case, even if the instructions are right next to each other in the right order in the instruction stream.
That's particularly true for the memcpy case, because it would be conceptually sensible to always do that first instruction (even for the "destination is already aligned" case) just to start the "have previous source buffer ready for shifting with the next one" for the mutually unaligned case.
Maybe when you take a trap on the middle instruction, the saved state will point to the first instruction so that you always restart there (so that it looks like one atomic sequence)?
IOW, the "three instruction" model really makes sense when you flow from one state to the next, but it also adds its own excitement for the "(re)start in the middle of the sequence" case. As per above, I think you can make that case an invalid situation, though.
And yes, I like "rep stos/movs" too, but I've also talked here about at least part of what makes that a "good, but not perfect" interface. It has a lot of good things going for it (that whole interruptibility is quite natural), but it does have some real complicating issues too from its historical semantics (uncached and overlapping range semantics are the two big ones, I feel).
So the intel implementation has to jump through some hoops due to compatibility concerns, that the ARM model doesn't need to.
But it's going to be some time before we see any implementation of the ARM thing, so I guess we'll have to wait and see.
I'm obviously happy to see this, and it looks sane to me. But you're right, implementations aren't here yet, and maybe it won't look as rosy in a few years.
Linus