By: Brett (ggtgp.delete@this.yahoo.com), September 16, 2021 11:32 pm
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on September 16, 2021 10:57 pm wrote:
> anonymou5 (no.delete@this.spam.com) on September 16, 2021 3:25 pm wrote:
> > https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-developments-2021
> >
> > memcpy, non-maskable interrupts, PMU updates, hinted conditional branches, and other goodies
>
> It is easy to declare instructions for memcpy().
It has taken four f**King decades to convince a RISC firm to add memcpy.
I have been mocking these companies for almost that long to add memcpy.
> The devil is in the details of
> the implementation. How long did it take Intel to get it even halfway right?
Intel’s is a byte copy that can overlap, like a text editor adding one character and using memcpy to zip the sentence back together.
All those software microcode checks kill you when all you want is to copy 4 aligned ints.
Microcode is just software, instructions that have dependancies and take cycles to run like all other instructions.
Look at the code for an unaligned overlapping memcpy on RISC, it’s long and obnoxious and slow. X86 memcpy is that same code in a ROM called by the instruction.
Intel will finally be forced to add a real memcpy for aligned data.
> I'll be curious to see who attempts to tackle this in an actual design, and how well such implementations
> hold up when Linus inevitably subjects it to some experimentation and looks for its glass jaw scenarios.
All you have to do is find out the real details on the three variants, that will tell you how fast they are. Intel’s version is brain dead because of its special cases that are documented.
> anonymou5 (no.delete@this.spam.com) on September 16, 2021 3:25 pm wrote:
> > https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-a-profile-architecture-developments-2021
> >
> > memcpy, non-maskable interrupts, PMU updates, hinted conditional branches, and other goodies
>
> It is easy to declare instructions for memcpy().
It has taken four f**King decades to convince a RISC firm to add memcpy.
I have been mocking these companies for almost that long to add memcpy.
> The devil is in the details of
> the implementation. How long did it take Intel to get it even halfway right?
Intel’s is a byte copy that can overlap, like a text editor adding one character and using memcpy to zip the sentence back together.
All those software microcode checks kill you when all you want is to copy 4 aligned ints.
Microcode is just software, instructions that have dependancies and take cycles to run like all other instructions.
Look at the code for an unaligned overlapping memcpy on RISC, it’s long and obnoxious and slow. X86 memcpy is that same code in a ROM called by the instruction.
Intel will finally be forced to add a real memcpy for aligned data.
> I'll be curious to see who attempts to tackle this in an actual design, and how well such implementations
> hold up when Linus inevitably subjects it to some experimentation and looks for its glass jaw scenarios.
All you have to do is find out the real details on the three variants, that will tell you how fast they are. Intel’s version is brain dead because of its special cases that are documented.