12:30 "[ISA] do not matter very much"

By: rwessel (rwessel.delete@this.yahoo.com), February 25, 2021 4:51 am
Room: Moderated Discussions
Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on February 25, 2021 1:02 am wrote:
> rwessel (rwessel.delete@this.yahoo.com) on February 24, 2021 8:45 am wrote:
> > Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on February 24, 2021 6:24 am wrote:
> > > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on February 24, 2021 4:37 am wrote:
> > > > Anon (no.delete@this.spam.com) on February 23, 2021 6:26 am wrote:
> > > > > Wilco (wilco.dijkstra.delete@this.ntlworld.com) on February 23, 2021 3:48 am wrote:
> > > > > > You forgot the sarcasm tag :-)
> > > > >
> > > > > Poor implementations don't prove an efficient implementation isn't possible.
> > > >
> > > > Given even a bad software implementation (the SSE2 one) thrashes rep movsb on modern
> > > > cores, it proves that it is not at all trivial to do a good hardware memcpy like
> > > > you suggested. It's not like Intel/AMD haven't been trying for years.
> > > >
> > > > > > bench-memcpy-random in GLIBC shows just how "efficient" rep movsb (__memcpy_erms) is on my 3700X:
> > > > >
> > > > > Your benchmark shows how hard it is to find the optimal software implementation of memcpy, there
> > > > > are 7 variations and a surprising fastest one (__memcpy_sse2_unaligned, what happens to AVX?), this
> > > > > show a somewhat lazy AMD that didn't even put their microcode to emit the best uop sequence.
> > > >
> > > > There isn't a single optimal implementation of memcpy for all possible use-cases. Software allows you
> > > > to select whichever one works best, and you can tweak it further, remove bottlenecks etc. However with
> > > > hardware you are stuck with the one in your CPU. In order for hardware memcpy to work out, it has to
> > > > be as fast as the best software implementation. So far nobody has proven this is feasible.
> > > >
> > > > Wilco
> > >
> > > To me, it looks a bit strange to talk about either microcode or hardware for memcpy (with an OoO core):
> > > - microcode is not the exact code which is inserted into the instruction execution windows,
> > > if you have a rep movsb with initial ecx=7, you have to fill the execution instruction window
> > > with 7 reads of the source address, 7 writes of the destination address (or an optimisation
> > > if reading multiple of bytes), and a clear of ecx if still alive. The problem is probably how
> > > many execution windows instructions you can insert in one cycle executing microcode.
> > > - hardware memcpy would mean some kind of DMA (into caches) and pausing the execution window?
> > >
> > > What is probably needed is specialised "execution window instructions"
> > > which can read up to a cache line, another
> > > to mask / insert from another cache line, and a third to write
> > > up to a cache line. Then the "rep movsb" microcode
> > > inserts (maybe a lot of) such "execution window instructions" into the "instructions in flight".
> > > Maybe that is what you meant, then please ignore that message...
> >
> >
> > Why? Send one micro-op to the LSU, and let it figure it out.
>
> That memcpy micro-op will have an unlimited amount of dependencies: imagine you have such
> micro-ops in flight (I should not use assembly notation for micro-op, but it is easier):
> mov [%esi + 8], #10
> mov [%esi + 12], %eax
> mov [%esi + 16], %ebx
> memcpy %edi, %esi, 32
> ...
> mov %eax, [%edi + 16]
>
> Then you might have a lot of those "memcpy" micro-op in your 100 instructions in flight...
>
> And for what I understand, micro-ops should have a pre-defined execution
> time because you allocate their activity cycles ahead of time.
>
> But I am not a micro-op specialist so may be completely wrong...


Well, it certainly can't be an issue of timing. The LSU already has to handle loads and stores, which might hit in L1, or have need to go to RAM attached to a different socket. And the LSUs already have to figure out memory ordering and sequential dependencies between unaligned 512-bit accesses and differently (un)aligned accesses of other sizes, possibly crossing page boundaries.

None of that would appear to be a problem until the rep/movsb crosses a page boundary (that may, of course, happen separately for the two operands). Worst case, a rep/movsb that will cross a page boundary ends up stalling subsequent memory accesses until it starts on the last source and destination pages it's going to hit. A better implementation could look ahead some number of pages (while it's copying) and make those dependencies determinations that number of pages before the copy completes.

As to the micro-op/instruction itself. It's not that big a dispatching stretch over existing instructions. The rep/movsb micro-op itself has would have three register inputs, plus the direction flag, and three outputs. Ordinary loads already have two register inputs, and an output, stores have three inputs, cmpxchg16b four inputs, and two registers plus the zero flag out.

I'm not saying this is trivial, it just doesn't seem like all that much of a stretch over what the LSU already has to do.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Lex Fridman interview with Jim KellerJohnG2021/02/19 11:25 PM
  12:30 "[ISA] do not matter very much" (NT)Moritz2021/02/20 09:45 AM
    AAARGH, what are we going to argue about then? (NT)j2021/02/20 02:43 PM
    Blasphemy! (NT):]2021/02/21 04:49 AM
    12:30 "[ISA] do not matter very much"anon22021/02/21 10:25 PM
      12:30 "[ISA] do not matter very much"Brett2021/02/21 11:59 PM
        12:30 "[ISA] do not matter very much"Etienne Lorrain2021/02/22 01:17 AM
      12:30 "[ISA] do not matter very much"Dummond D. Slow2021/02/22 08:57 AM
        12:30 "[ISA] do not matter very much"Anon2021/02/22 10:52 AM
          12:30 "[ISA] do not matter very much"juanrga2021/02/22 11:01 AM
          12:30 "[ISA] do not matter very much"Mark Roulo2021/02/22 11:54 AM
          ARM being a good idea doesn't mean it would have worked for AMDDummond D. Slow2021/02/22 01:34 PM
            ARM being a good idea doesn't mean it would have worked for AMDAnon2021/02/22 03:25 PM
              ARM being a good idea doesn't mean it would have worked for AMDDummond D. Slow2021/02/22 04:55 PM
                ARM being a good idea doesn't mean it would have worked for AMDDoug S2021/02/23 12:03 PM
                  ARM being a good idea doesn't mean it would have worked for AMDDummond D. Slow2021/02/23 12:27 PM
                    ARM being a good idea doesn't mean it would have worked for AMDBrett2021/02/23 03:57 PM
                      3rd parties licensing ARM coresAnon2021/02/25 04:01 AM
                        3rd parties licensing ARM coresAnon2021/02/25 04:48 AM
                        3rd parties licensing ARM coresdmcq2021/02/25 06:01 AM
                          3rd parties licensing ARM coresDummond D. Slow2021/02/25 09:17 AM
                            3rd parties licensing ARM coresAnon2021/02/25 10:11 AM
                              3rd parties licensing ARM coresAnon2021/02/26 02:54 AM
                              3rd parties licensing ARM coresDummond D. Slow2021/02/26 10:01 AM
            ARM being a good idea doesn't mean it would have worked for AMDLinus Torvalds2021/02/22 05:06 PM
              ARM being a good idea doesn't mean it would have worked for AMDDummond D. Slow2021/02/22 07:19 PM
              ARM being a good idea doesn't mean it would have worked for AMDanon22021/02/22 07:28 PM
              ARM being a good idea doesn't mean it would have worked for AMDdmcq2021/02/23 05:35 AM
                ARM being a good idea doesn't mean it would have worked for AMDJukka Larja2021/02/23 07:12 AM
                  ARM being a good idea doesn't mean it would have worked for AMDSimon Farnsworth2021/02/23 08:42 AM
                    ARM being a good idea doesn't mean it would have worked for AMDJukka Larja2021/02/24 06:03 AM
                ARM may have been a threat to Intelwumpus2021/02/23 08:30 AM
      12:30 "[ISA] do not matter very much"blaine2021/02/22 09:37 AM
        12:30 "[ISA] do not matter very much"anon22021/02/22 07:17 PM
          12:30 "[ISA] do not matter very much"Anon2021/02/23 03:05 AM
            12:30 "[ISA] do not matter very much"Wilco2021/02/23 03:48 AM
              12:30 "[ISA] do not matter very much"Bigos2021/02/23 03:55 AM
                12:30 "[ISA] do not matter very much"Wilco2021/02/23 04:15 AM
                  12:30 "[ISA] do not matter very much"Bigos2021/02/23 05:16 AM
                12:30 "[ISA] do not matter very much"Travis Downs2021/02/26 11:46 PM
              12:30 "[ISA] do not matter very much"Anon2021/02/23 06:26 AM
                12:30 "[ISA] do not matter very much"anon22021/02/23 04:35 PM
                  12:30 "[ISA] do not matter very much"Anon2021/02/24 07:57 AM
                12:30 "[ISA] do not matter very much"Wilco2021/02/24 04:37 AM
                  12:30 "[ISA] do not matter very much"Etienne Lorrain2021/02/24 06:24 AM
                    12:30 "[ISA] do not matter very much"Anon2021/02/24 08:11 AM
                    12:30 "[ISA] do not matter very much"rwessel2021/02/24 08:45 AM
                      12:30 "[ISA] do not matter very much"Etienne Lorrain2021/02/25 01:02 AM
                        12:30 "[ISA] do not matter very much"rwessel2021/02/25 04:51 AM
                        12:30 "[ISA] do not matter very much"Anon2021/02/25 04:53 AM
                  12:30 "[ISA] do not matter very much"Anon2021/02/24 08:07 AM
                    12:30 "[ISA] do not matter very much"Wilco2021/02/24 11:37 AM
                      runtime selection vs. heterogenous cores?Matt Sayler2021/02/24 06:10 PM
                        runtime selection vs. heterogenous cores?Wilco2021/02/26 05:22 AM
            12:30 "[ISA] do not matter very much"anon22021/02/23 04:20 AM
              12:30 "[ISA] do not matter very much"Anon2021/02/23 06:21 AM
                12:30 "[ISA] do not matter very much"none2021/02/23 07:37 AM
                  12:30 "[ISA] do not matter very much"rwessel2021/02/23 09:44 AM
                    12:30 "[ISA] do not matter very much"anon22021/02/23 04:30 PM
                      12:30 "[ISA] do not matter very much"Anon2021/02/24 08:25 AM
                        12:30 "[ISA] do not matter very much"anon.12021/02/25 06:13 AM
                  12:30 "[ISA] do not matter very much"Anon2021/02/24 08:44 AM
                12:30 "[ISA] do not matter very much"anon22021/02/23 03:51 PM
                  12:30 "[ISA] do not matter very much"Anon2021/02/24 08:31 AM
            12:30 "[ISA] do not matter very much"vvid2021/02/23 06:41 AM
              12:30 "[ISA] do not matter very much"Michael S2021/02/23 08:52 AM
                12:30 "[ISA] do not matter very much"rwessel2021/02/23 09:33 AM
                  12:30 "[ISA] do not matter very much"Linus Torvalds2021/02/23 11:44 AM
                    12:30 "[ISA] do not matter very much"rwessel2021/02/23 12:21 PM
                      12:30 "[ISA] do not matter very much"Linus Torvalds2021/02/23 12:30 PM
                        12:30 "[ISA] do not matter very much"Andrey2021/02/25 03:06 AM
                          12:30 "[ISA] do not matter very much"Anon2021/02/25 05:04 AM
                            12:30 "[ISA] do not matter very much"Andrey2021/02/25 05:54 AM
                              12:30 "[ISA] do not matter very much"Anon2021/02/25 06:33 AM
                          12:30 "[ISA] do not matter very much"Linus Torvalds2021/02/25 10:35 AM
                            12:30 "[ISA] do not matter very much"Andrey2021/02/25 01:34 PM
                              12:30 "[ISA] do not matter very much"Etienne Lorrain2021/02/26 01:18 AM
                              12:30 "[ISA] do not matter very much"dmcq2021/02/26 03:23 PM
                12:30 "[ISA] do not matter very much"Anon2021/02/24 08:45 AM
              12:30 "[ISA] do not matter very much"Gabriele Svelto2021/02/23 09:15 AM
          Context of ISA doesn't matterPaul A. Clayton2021/02/26 12:03 PM
  Is there a text version? (NT)Foo_2021/02/20 04:33 PM
    good question (NT)Michael S2021/02/21 04:31 AM
    Is there a text version?:]2021/02/21 10:34 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?