By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 23, 2022 5:38 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on May 23, 2022 2:08 am wrote:
> I.e. it relies on compiler for final instruction selection, right?
Right. For Load vs LoadU, those are inline functions that call the _mm[256]_load_si128 and _mm[256]_loadu_si128 intrinsics. The compiler could indeed choose to generate something other than movdqa and movdqu. For example, it can pull Load into an SSE4 memory operand, which must be aligned, whereas LoadU will require a separate load instruction.
> I.e. it relies on compiler for final instruction selection, right?
Right. For Load vs LoadU, those are inline functions that call the _mm[256]_load_si128 and _mm[256]_loadu_si128 intrinsics. The compiler could indeed choose to generate something other than movdqa and movdqu. For example, it can pull Load into an SSE4 memory operand, which must be aligned, whereas LoadU will require a separate load instruction.