By: Mark Roulo (nothanks.delete@this.xxx.com), September 30, 2021 7:37 am
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on September 29, 2021 11:22 pm wrote:
... snip ...
> Certainly. But I still don't see the point of the separate setup and finalize instructions - detecting
> those conditions is trivial (if the destination address has any low bits set, do "first", if you've fallen
> out of the "middle" loop, and the length is not zero, do a "last"). Internalizing that stuff would probably
> make it easier to sneak up on page boundaries as well, at least for simpler implementations.
I am not a hardware guy. But ...
Could it be that the three instructions are (in some ARM implementations) expected to be cracked into different micro-ops? That might be a good reason to have three separate ops --- it lets the cracking stay simple.
... snip ...
> Certainly. But I still don't see the point of the separate setup and finalize instructions - detecting
> those conditions is trivial (if the destination address has any low bits set, do "first", if you've fallen
> out of the "middle" loop, and the length is not zero, do a "last"). Internalizing that stuff would probably
> make it easier to sneak up on page boundaries as well, at least for simpler implementations.
I am not a hardware guy. But ...
Could it be that the three instructions are (in some ARM implementations) expected to be cracked into different micro-ops? That might be a good reason to have three separate ops --- it lets the cracking stay simple.