By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), February 25, 2021 11:35 am
Room: Moderated Discussions
Andrey (andrey.semashev.delete@this.gmail.com) on February 25, 2021 3:06 am wrote:
>
> memcpy can afford language like "undefined behavior", which is where most its liberties come from. I don't
> think an actual hardware instruction can afford that. The ISA must specify the instruction behavior in
> every use case, even though in some cases that behavior might be patologic or a hardware exception.
For things like access size or memory ordering issues, implementation-defined behavior is not the exception, it's the norm. x86 actually has fewer of them than most, but even x86 already has them as part of the architecture.
Look at instructions like "clear cacheline" (or invalidate it) that exist in many many different architectures. It's not "undefined", but the instruction does different things on different microarchitectures with different line sizes.
That's the kind of microarchitectural effect that I'd suggest a memcpy instruction does: the actual access size is implementation-defined. Not undefined.
It's a hell of a lot better than "clear cacheline", which is an actual dangerous instruction to use, because not only does it have different access patterns, it has actual different semantics on different microarchitectures.
A "memcpy" instruction only has access pattern differences, not actual semantic differences. You can see the access size in the overlap case, but that's no different from using a load/store sequence - the only issue is that now the load/store size is just a microarchitectural feature (and typically it would be the cache line size, but it could easily be just the "cache access size" which may be smaller).
Seriously, if you think a memcpy instruction is questionable, you should immediately stop using every single machine you have. Those "invalidate cacheline" instruction issues have caused actual real and present problems, because they are much worse than what I suggested.
Google "A tale of an impossible bug".
Linus
>
> memcpy can afford language like "undefined behavior", which is where most its liberties come from. I don't
> think an actual hardware instruction can afford that. The ISA must specify the instruction behavior in
> every use case, even though in some cases that behavior might be patologic or a hardware exception.
For things like access size or memory ordering issues, implementation-defined behavior is not the exception, it's the norm. x86 actually has fewer of them than most, but even x86 already has them as part of the architecture.
Look at instructions like "clear cacheline" (or invalidate it) that exist in many many different architectures. It's not "undefined", but the instruction does different things on different microarchitectures with different line sizes.
That's the kind of microarchitectural effect that I'd suggest a memcpy instruction does: the actual access size is implementation-defined. Not undefined.
It's a hell of a lot better than "clear cacheline", which is an actual dangerous instruction to use, because not only does it have different access patterns, it has actual different semantics on different microarchitectures.
A "memcpy" instruction only has access pattern differences, not actual semantic differences. You can see the access size in the overlap case, but that's no different from using a load/store sequence - the only issue is that now the load/store size is just a microarchitectural feature (and typically it would be the cache line size, but it could easily be just the "cache access size" which may be smaller).
Seriously, if you think a memcpy instruction is questionable, you should immediately stop using every single machine you have. Those "invalidate cacheline" instruction issues have caused actual real and present problems, because they are much worse than what I suggested.
Google "A tale of an impossible bug".
Linus