memcpy - instruction cracking vs DMA

By: rwessel (rwessel.delete@this.yahoo.com), October 3, 2021 4:06 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on October 3, 2021 1:51 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 2, 2021 10:43 am wrote:
> > rpg (a.delete@this.b.com) on October 2, 2021 2:51 am wrote:
> > >
> > > Why handle memcpy with microcoded instructions/cracked uOPs?
> > >
> > > Wouldn't a simple DMA unit be able to handle this?
> >
> > DMA units are stupid.
> >
> > Seriously. Stop perpetuating that myth from the 80s.
> >
> > Back in the days long long gone, DMA units made sense because
> >
> > (a) CPU's were often slower than DRAM
> >
> > (b) caches weren't a thing
> >
> > and neither of those have been true for decades by now outside of some very very embedded stuff where the
> > CPU isn't even remotely the main concern of the hardware (ie there are places where people have a very weak
> > CPU that just handles some bookkeeping functionality, and
> > the real heavy lifting is done by specialized hardware
> > - very much including DMA engines built into those things. Think networking or media processors).
> >
> > Also, stop thinking that memory copies are about moving big amounts of data. That's very seldom
> > actually true outside of some broken memory throughput benchmarks. The most common thing by
> > far is moving small stuff that is a few tens of bytes in size, often isn't cacheline aligned,
> > and is quite often somewhere in the cache hierarchy (but not necessarily L1 caches).
> >
> > The reason you want memset/memmove/memcpy instructions is because
> >
> > (a) the CPU memory unit already has buffers with byte masking and shifting built in
> >
> > (b) you should never expose the micro-architectural details of what exactly is the
> > buffer size for said masking and shifting, and how many buffers you have etc etc.
> >
> > (c) you should absolutely not have to bring in the data to the register
> > file, because you may be able to keep the data further away
> >
> > so anybody who says "just use vector instructions" is also wrong.
> >
> > No, the answer is not some DMA unit, because you'd just be screwing up caches with
> > those, or duplicating your existing hardware. The latency of talking to an outside
> > unit is higher than the cost of just doing the operation in 90% of all cases.
> >
> > And no, the answer is not vector units, because you'll just waste an incredible amount of effort and energy
> > on trying to deal with the impedance issues of the visible instruction set and architectural state, and the
> > low-level details of your memory unit, and you'll not be able to do clever things inside the caches.
> >
> > Linus
>
> I agree to everything except last paragraph.
> The last paragraph is a demonstration of blind hatred.
> VU *is* the best technical answer to fixed-width memory copy/set in range from
> 9B to ~500B. May be, up to ~1000B. It just needs few proper instructions (like load/store
> register pair with 1B granularity of source/destination) and ubiquity.


Only if the vector unit doesn't need to be powered up, and the save of its state can be amortized over other work. In a lot of kernel code, the VU isn't available period.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 02:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 09:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 10:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 10:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/16 11:51 PM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 04:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 06:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 01:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 02:58 PM
              microbenchmark resultsMichael S2021/09/19 03:46 PM
                microbenchmark source codeMichael S2021/09/19 03:58 PM
                  microbenchmark source code-.-2021/09/20 03:49 PM
                    microbenchmark source codeMichael S2021/09/21 09:17 AM
                      microbenchmark source code-.-2021/09/21 03:33 PM
                        microbenchmark source codeMichael S2021/09/21 05:05 PM
                microbenchmark resultsAnon2021/09/19 04:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 07:46 PM
                    microbenchmark resultsdmcq2021/09/20 01:19 AM
                      microbenchmark resultsMichael S2021/09/20 04:12 AM
                      microbenchmark results-.-2021/09/20 03:44 PM
                        microbenchmark resultsMichael S2021/09/21 09:23 AM
                          microbenchmark results-.-2021/09/21 03:35 PM
                            microbenchmark resultsAndrey2021/09/21 04:25 PM
                              I agree (NT)Michael S2021/09/21 05:07 PM
                              microbenchmark results-.-2021/09/22 04:56 PM
                                microbenchmark resultsMichael S2021/09/23 05:11 AM
                                  microbenchmark resultsdmcq2021/09/23 06:53 AM
                                  microbenchmark resultsAndrey2021/09/23 09:20 AM
                                microbenchmark resultsAndrey2021/09/23 09:11 AM
                                  microbenchmark results-.-2021/09/23 07:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 01:47 AM
                                      microbenchmark results-.-2021/09/24 05:00 PM
                                    microbenchmark resultsAndrey2021/09/24 07:29 AM
                                      microbenchmark resultsdmcq2021/09/24 12:05 PM
                                        microbenchmark resultsDoug S2021/09/24 01:12 PM
                                          microbenchmark results---2021/09/24 06:06 PM
                                            microbenchmark resultsDoug S2021/09/24 10:46 PM
                                              microbenchmark results---2021/09/25 08:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 01:01 AM
                                                microbenchmark resultsDoug S2021/09/26 08:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 12:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 09:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 06:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 11:49 AM
                                                microbenchmark resultsBrett2021/09/25 02:31 PM
                                              microbenchmark resultsdmcq2021/09/25 11:51 AM
                                                microbenchmark resultsDoug S2021/09/26 08:45 AM
                                            microbenchmark resultsRichard S2021/09/25 12:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 11:52 AM
                                                microbenchmark results---2021/09/25 02:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 05:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 03:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 04:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 08:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 08:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 05:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 11:12 AM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 01:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 05:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 05:50 AM
                    microbenchmark results---2021/09/20 06:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 04:10 AM
                      microbenchmark resultsMichael S2021/09/23 04:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 08:24 AM
                          microbenchmark resultsRoyi2021/09/26 03:25 PM
                      microbenchmark resultsdmcq2021/09/23 09:42 AM
                        microbenchmark results---2021/09/23 10:53 AM
                      microbenchmark resultsanon22021/09/23 01:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 12:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 01:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 04:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 05:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 04:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 07:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 10:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 11:23 AM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 01:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 05:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 05:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 10:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 06:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 06:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 09:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 12:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 12:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 05:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 08:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 12:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 01:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 02:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 05:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 06:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 12:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 12:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 02:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 05:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 06:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 09:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 09:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 10:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 06:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 07:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 07:12 AM
                                  Did they publish a full description?rwessel2021/09/30 08:18 AM
                                    Did they publish a full description?Michael S2021/09/30 09:24 AM
                                      Did they publish a full description?rwessel2021/09/30 09:42 AM
                                    Did they publish a full description?Adrian2021/09/30 11:22 PM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 09:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/09/30 11:27 PM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 03:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 08:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 09:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 01:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 04:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 03:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 04:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 10:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 01:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 02:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 08:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 09:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 10:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 05:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 09:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 10:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 10:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 02:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 02:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 02:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 06:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 12:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 09:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 09:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 10:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 11:41 AM
                                                              How about environments such as Java?rwessel2021/10/03 11:49 AM
                                                                How about environments such as Java?Mark Roulo2021/10/03 12:22 PM
                                                              How about environments such as Java?anon22021/10/03 06:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 04:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 04:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 05:04 AM
                                                                  How about environments such as Java?anon22021/10/04 05:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 06:31 AM
                                                                How about environments such as Java?---2021/10/04 08:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 09:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 11:14 AM
                                                                      How about environments such as Java?Doug S2021/10/04 12:20 PM
                                                                  How about environments such as Java?anon22021/10/04 01:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 03:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 04:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 10:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 11:21 AM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 03:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 09:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 12:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 12:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 04:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 10:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 01:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 02:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 03:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 04:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 08:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 01:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 10:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 10:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 12:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 04:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 04:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 07:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 09:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 04:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 06:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 07:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 10:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 03:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 05:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 05:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 08:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 09:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 09:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 07:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 08:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 10:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 12:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 06:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 09:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 12:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 04:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 03:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 09:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 05:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 08:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 04:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 09:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 11:31 AM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 12:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 09:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 01:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?