By: Carson (carson.delete@this.example.edu), September 30, 2021 10:28 pm
Room: Moderated Discussions
Everyone seems to be assuming that one move is a three-instruction sequence based primarily on the P/M/E mnemonic endings, and assuming they're some sort of "preamble/middle/end" trio. Is this confirmed by some other source?
Is it possible that it's one instruction per operation, with the instruction depending on the operation you want to perform? E.g. plus (standard memcpy, a.k.a. rep movs), minus (rep movs with DF=1), and either (CPU chooses direction to avoid overlap, a.k.a. memmove)?
While the load-alignment shifter can be set up in the shadow of the L1 cache access, checking for overlap is on the critical path to computing the first load address, and overlap is something a compiler is likely to know, so passing that hint to the CPU if available makes a lot of sense.
There are two things that militate against my interpretation:
I'm not sure of anything, just wondering where the boundary between documentation and extrapolation is at the moment.
Is it possible that it's one instruction per operation, with the instruction depending on the operation you want to perform? E.g. plus (standard memcpy, a.k.a. rep movs), minus (rep movs with DF=1), and either (CPU chooses direction to avoid overlap, a.k.a. memmove)?
While the load-alignment shifter can be set up in the shadow of the L1 cache access, checking for overlap is on the critical path to computing the first load address, and overlap is something a compiler is likely to know, so passing that hint to the CPU if available makes a lot of sense.
There are two things that militate against my interpretation:
- The
memset()
instructions also have three variants, which make no sense if the instructions differ in overlap handling. - For the "either" operation to be restartable, I'd deviate from the x86 convention and use memmove-like (dst_start, src_start, len) parameters even for the reverse copy. But that would mean that the backward copy would leave the registers as (dst_start_ src_start, 0), in which case there'd be no need for a
!
suffix on the dst and src operands for that form.
I'm not sure of anything, just wondering where the boundary between documentation and extrapolation is at the moment.