By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), February 23, 2021 1:30 pm
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on February 23, 2021 12:21 pm wrote:
>
> Is that really a problem? Just check for aliasing at startup, and as each operand
> crosses page boundaries.
It's been a problem. Look up the historical performance profiles of various rep movs ERMS implementations, and they have things like "if the source and destination address overlap in 64 bytes modulo 4096", it slows down.
I forget the exact details, but it's been an actual issue.
> > Same goes for MMIO memory. If you do memcpy() on MMIO memory, you get whatever random
> > end results. But for movsb it's actually acrhitecturally defined, and usually not
> > what you want (ie the definition is the "go slow, one byte at a time").
>
> I don't know how you get around that. If you do a byte access to MMIO memory, the hardware *has* to do
> byte accesses.
That's my point.
The way you get around it is that you define the instruction not as a "byte copy loop", but as a "memcpy". Which basically requires a new instruction with subtly different semantics.
That's what I said. Read my suggestion again.
"rep movs" is almost perfect. But it has real and present problems. And MMIO is one of them. DF is another. Physical aliasing is a third.
None of these are common, but they happen, and they cause problems.
Linus
>
> Is that really a problem? Just check for aliasing at startup, and as each operand
> crosses page boundaries.
It's been a problem. Look up the historical performance profiles of various rep movs ERMS implementations, and they have things like "if the source and destination address overlap in 64 bytes modulo 4096", it slows down.
I forget the exact details, but it's been an actual issue.
> > Same goes for MMIO memory. If you do memcpy() on MMIO memory, you get whatever random
> > end results. But for movsb it's actually acrhitecturally defined, and usually not
> > what you want (ie the definition is the "go slow, one byte at a time").
>
> I don't know how you get around that. If you do a byte access to MMIO memory, the hardware *has* to do
> byte accesses.
That's my point.
The way you get around it is that you define the instruction not as a "byte copy loop", but as a "memcpy". Which basically requires a new instruction with subtly different semantics.
That's what I said. Read my suggestion again.
"rep movs" is almost perfect. But it has real and present problems. And MMIO is one of them. DF is another. Physical aliasing is a third.
None of these are common, but they happen, and they cause problems.
Linus