By: Michael S (already5chosen.delete@this.yahoo.com), February 23, 2021 9:52 am
Room: Moderated Discussions
vvid (no.delete@this.thanks.com) on February 23, 2021 6:41 am wrote:
> Anon (no.delete@this.spam.com) on February 23, 2021 3:05 am wrote:
> > anon2 (anon.delete@this.anon.com) on February 22, 2021 7:17 pm wrote:
> > > I don't think memcpy, memset instructions are bad per se, though I still don't understand
> > > the fascination with them, unless their proponents are going to move on to do-daxpy, route-ip-packet,
> > > gzip-memory, etc instructions when/if one day Intel's rep ; mov finally doesn't suck. But
> > > I digress, the point was not a totally open-ended "ISA does not matter".
> >
> > memcpy is quite common, easy to implement in hardware and very inefficient to implement in software.
>
> So easy, that it took Intel literally 4 decades to achieve an acceptable* performance of REP MOVSB.
> * in some situations
Well, in many common situations, rep movs was quite good since at least Nehalem. And up to and including i386 it was quite good in different, but even higher proportion of common situations (all small copies with lengths unknown in compile time).
So, unsatisfactory state of affairs where rep movsb was good in too small amount of situations lasted less than 20 years. May be less than 15, I am not too sure.
Also, IMHO, even for ideal "rep mosvb" implementation it is acceptable to be few clocks slower that wider "rep movsX' variants when length of copy is a small multiple of sizeof(x).
Besides, x86 'rep movsb' is too defined in situations where source and destination buffers overlap.
It would be easier to achieve top performance if overlaps are either undefined (in bounded manner, like, never read outside of src[], never write outside of dst[], but other than that a content of dst[] can be any mix of zeros and original src and dst bytes) or defined to do nothing.
> Anon (no.delete@this.spam.com) on February 23, 2021 3:05 am wrote:
> > anon2 (anon.delete@this.anon.com) on February 22, 2021 7:17 pm wrote:
> > > I don't think memcpy, memset instructions are bad per se, though I still don't understand
> > > the fascination with them, unless their proponents are going to move on to do-daxpy, route-ip-packet,
> > > gzip-memory, etc instructions when/if one day Intel's rep ; mov finally doesn't suck. But
> > > I digress, the point was not a totally open-ended "ISA does not matter".
> >
> > memcpy is quite common, easy to implement in hardware and very inefficient to implement in software.
>
> So easy, that it took Intel literally 4 decades to achieve an acceptable* performance of REP MOVSB.
> * in some situations
Well, in many common situations, rep movs was quite good since at least Nehalem. And up to and including i386 it was quite good in different, but even higher proportion of common situations (all small copies with lengths unknown in compile time).
So, unsatisfactory state of affairs where rep movsb was good in too small amount of situations lasted less than 20 years. May be less than 15, I am not too sure.
Also, IMHO, even for ideal "rep mosvb" implementation it is acceptable to be few clocks slower that wider "rep movsX' variants when length of copy is a small multiple of sizeof(x).
Besides, x86 'rep movsb' is too defined in situations where source and destination buffers overlap.
It would be easier to achieve top performance if overlaps are either undefined (in bounded manner, like, never read outside of src[], never write outside of dst[], but other than that a content of dst[] can be any mix of zeros and original src and dst bytes) or defined to do nothing.