By: rwessel (rwessel.delete@this.yahoo.com), September 30, 2021 8:02 am
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on September 30, 2021 7:37 am wrote:
> rwessel (rwessel.delete@this.yahoo.com) on September 29, 2021 11:22 pm wrote:
> ... snip ...
>
> > Certainly. But I still don't see the point of the separate setup and finalize instructions - detecting
> > those conditions is trivial (if the destination address has any low bits set, do "first", if you've fallen
> > out of the "middle" loop, and the length is not zero, do
> > a "last"). Internalizing that stuff would probably
> > make it easier to sneak up on page boundaries as well, at least for simpler implementations.
>
> I am not a hardware guy. But ...
>
> Could it be that the three instructions are (in some ARM implementations) expected to be cracked into different
> micro-ops? That might be a good reason to have three separate ops --- it lets the cracking stay simple.
Probably it would help a bit, but you're likely going to have to do some of that anyway, for handling cases where the alignment of the two operands is the same (or not), and for when you get near page boundaries. And in the case of the memmove() support instructions, probably different ops for the forwards and backwards cases. It probably doesn't matter too much if you assume all memcpy()s are long.
> rwessel (rwessel.delete@this.yahoo.com) on September 29, 2021 11:22 pm wrote:
> ... snip ...
>
> > Certainly. But I still don't see the point of the separate setup and finalize instructions - detecting
> > those conditions is trivial (if the destination address has any low bits set, do "first", if you've fallen
> > out of the "middle" loop, and the length is not zero, do
> > a "last"). Internalizing that stuff would probably
> > make it easier to sneak up on page boundaries as well, at least for simpler implementations.
>
> I am not a hardware guy. But ...
>
> Could it be that the three instructions are (in some ARM implementations) expected to be cracked into different
> micro-ops? That might be a good reason to have three separate ops --- it lets the cracking stay simple.
Probably it would help a bit, but you're likely going to have to do some of that anyway, for handling cases where the alignment of the two operands is the same (or not), and for when you get near page boundaries. And in the case of the memmove() support instructions, probably different ops for the forwards and backwards cases. It probably doesn't matter too much if you assume all memcpy()s are long.