By: rpg (a.delete@this.b.com), October 2, 2021 2:51 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 1, 2021 11:01 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on October 1, 2021 5:04 am wrote:
> >
> > Like, first instruction brings destination to [coarse] aligned boundary etc...
>
> It could be even simpler.
>
> The first instruction might not do anything about the actual copy at all.
>
> It might just do pure bookkeeping functionality, like "check overlapping ranges" or "check
> if it's large enough and mutually aligned so that you can do cacheline level optimizations".
> Things like setting flags to say how to copy (kind of like how x86 uses the DF flag).
>
> That would make the first instruction fairly uninteresting, and the second instruction
> would be the one that does all the repeating work (with the third instruction doing
> what? Maybe the final tail, maybe just some internal state cleanup?)
>
> But if the restart happens on the second instructions, I don't know where the first instruction would squirrel
> away any state information it has determined, though. It would have to be in some architected register state,
> so that nested memory copies work (ie taking a page fault, doing another memory copy in the kernel or VMM).
>
> So I personally think it would be best to always cause restarts to restart at the first instruction,
> exactly so that you could have magic micro-architectural hidden state. If you always restart
> at the first instruction, you could literally have hidden "previous read" buffers for the
> mutually unaligned case, hidden "do it with cache transfers" flags, or direction flags etc,
> and never expose your random microarchitectural choices anywhere else.
>
> And so it would allow you to migrate cleanly between different microarchitectures
> (either BIG.little or just VM migration) without any odd special cases.
>
> VM migration is an interesting case, and having it happen in the middle of a big memory copy is not
> at all some kind of exceptionally unusual situation. So any model that does something special in
> the first instruction - and then exposes restarts on the second one - sounds a bit iffy to me.
>
> IOW, restart at the first instruction really seems like the technically correct solution.
>
> This is something the x86 "rep movs" got right. No odd partial instruction restart cases.
>
> Of course, "rep movs" has other problems, so..
>
> Linus
Why handle memcpy with microcoded instructions/cracked uOPs?
Wouldn't a simple DMA unit be able to handle this? IE, if you can setup DMA from various IO controllers to RAM, then maybe this is all you need. (Atleast the non-overlapping case should be fine).
AFAICS, it will simplify the CPU implementation a bit as well. So, why
> Michael S (already5chosen.delete@this.yahoo.com) on October 1, 2021 5:04 am wrote:
> >
> > Like, first instruction brings destination to [coarse] aligned boundary etc...
>
> It could be even simpler.
>
> The first instruction might not do anything about the actual copy at all.
>
> It might just do pure bookkeeping functionality, like "check overlapping ranges" or "check
> if it's large enough and mutually aligned so that you can do cacheline level optimizations".
> Things like setting flags to say how to copy (kind of like how x86 uses the DF flag).
>
> That would make the first instruction fairly uninteresting, and the second instruction
> would be the one that does all the repeating work (with the third instruction doing
> what? Maybe the final tail, maybe just some internal state cleanup?)
>
> But if the restart happens on the second instructions, I don't know where the first instruction would squirrel
> away any state information it has determined, though. It would have to be in some architected register state,
> so that nested memory copies work (ie taking a page fault, doing another memory copy in the kernel or VMM).
>
> So I personally think it would be best to always cause restarts to restart at the first instruction,
> exactly so that you could have magic micro-architectural hidden state. If you always restart
> at the first instruction, you could literally have hidden "previous read" buffers for the
> mutually unaligned case, hidden "do it with cache transfers" flags, or direction flags etc,
> and never expose your random microarchitectural choices anywhere else.
>
> And so it would allow you to migrate cleanly between different microarchitectures
> (either BIG.little or just VM migration) without any odd special cases.
>
> VM migration is an interesting case, and having it happen in the middle of a big memory copy is not
> at all some kind of exceptionally unusual situation. So any model that does something special in
> the first instruction - and then exposes restarts on the second one - sounds a bit iffy to me.
>
> IOW, restart at the first instruction really seems like the technically correct solution.
>
> This is something the x86 "rep movs" got right. No odd partial instruction restart cases.
>
> Of course, "rep movs" has other problems, so..
>
> Linus
Why handle memcpy with microcoded instructions/cracked uOPs?
Wouldn't a simple DMA unit be able to handle this? IE, if you can setup DMA from various IO controllers to RAM, then maybe this is all you need. (Atleast the non-overlapping case should be fine).
AFAICS, it will simplify the CPU implementation a bit as well. So, why