Armv8.8-A and Armv9.3-A - moves

By: rwessel (rwessel.delete@this.yahoo.com), September 30, 2021 5:20 pm
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on September 30, 2021 2:56 pm wrote:
> rwessel (rwessel.delete@this.yahoo.com) on September 30, 2021 10:39 am wrote:
> > Doug S (foo.delete@this.bar.bar) on September 30, 2021 9:48 am wrote:
> > > rwessel (rwessel.delete@this.yahoo.com) on September 29, 2021 11:22 pm wrote:
> > > > Certainly. But I still don't see the point of the separate setup and finalize instructions - detecting
> > > > those conditions is trivial (if the destination address has any low bits set, do "first", if you've fallen
> > > > out of the "middle" loop, and the length is not zero, do
> > > > a "last"). Internalizing that stuff would probably
> > > > make it easier to sneak up on page boundaries as well, at least for simpler implementations.
> > >
> > >
> > > Sure detecting that stuff is trivial, but they would effectively be three separate
> > > operations internally as you outline. So why not make that explicit and reduce
> > > the amount of state you have to carry when the operation is interrupted?
> >
> > If it leads to good memcpy() performance, with simple implementations, I'm all for whatever they've done.
> >
> > That being said, the separation is at least a bit artificial, and that presents at least a
> > few potential problem areas. First, the three instruction scheme makes optimizing fairly
> > short memcpy()s difficult - you'll have to execute all three instructions no matter what.
> >
> > At least for simple implementations, requiring the middle and final instruction operate on aligned
> > words (at least for the destination) poses some challenges around page boundaries. If nothing
> > else, having to store a full aligned word in every cycle will require that crossing a page boundary
> > be able to handle three page faults. If the state requirements were looser, the instruction could
> > step more delicately over a page boundary, eliminating the need to handle the third page fault.
> > A truly high end implementation may care about that less than a "medium" one.
> >
> > Also separating the start, middle and end instructions requires that they either architect the
> > state those store or use, or you'll have trouble migrating running code to cores that might
> > have different implementations (say from a big to a little core, to another core in a cluster,
> > or a VM migration). And if you architect them, you run the risk of fixing things like the effective
> > word size, which may impact future implementations (either by limiting their word sizes, or
> > by requiring their "middle" instruction to handle partially aligned operands).
>
>
> If an implementation wants to minimize issues with big core / little core interaction, e.g. have
> a larger width in the big core, then it both cores will use the larger alignment. Doesn't cost
> much for the little cores to use a slightly more restrictive alignment than would otherwise be
> necessary for their narrower engine. That way in progress instructions interrupted on a core
> can be continued on a core of a different size without any special casing required.
>
> The alignment that a "middle" instruction expects to result from the completed execution of a "start"
> instruction is something the end user doesn't need to know or care about. It doesn't matter to me
> if I am doing a memory copy on a core that wants a 64 bit alignment to do copies in 64 bit hunks or
> 256 bit alignment to do copies in 256 bit hunks. Or wants a 256 bit alignment to do copies in 64 bit
> hunks (i.e. a little core on a machine with big cores that operate on 256 bit hunks) The three instructions
> handle that just as automatically as a single instruction would. Same for VM migration, the "start"
> instruction will create whatever alignment that the "middle" instruction expects.
>
> You could (theoretically) run the same ARMv9 binary on a watch SoC that maybe has only a 32
> bit wide memory bus or a supercomputer that has a 2048 bit wide memory bus, and the start/middle/finish
> instructions will do what is required on that hardware. The three instructions can handle the
> implementation details just as well as a single do it all instruction could.


Then the smaller implementation needs to be able to handle longer start and end moves than would be natural. Sure you could, but it seems a bit painful - begin (and end) would then basically be the merged instructions anyway. Across a cluster remains an issue, unless any little CPUs could have their minimum alignment adjusted by the OS.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Armv8.8-A and Armv9.3-Aanonymou52021/09/16 03:25 PM
  Armv8.8-A and Armv9.3-ADoug S2021/09/16 10:57 PM
    Armv8.8-A and Armv9.3-ABrett2021/09/16 11:32 PM
      Armv8.8-A and Armv9.3-Aanon2021/09/16 11:55 PM
        Armv8.8-A and Armv9.3-Anone2021/09/17 12:51 AM
      Armv8.8-A and Armv9.3-AJörn Engel2021/09/17 05:42 AM
        Armv8.8-A and Armv9.3-AMichael S2021/09/17 07:48 AM
          Armv8.8-A and Armv9.3-AJörn Engel2021/09/18 02:01 PM
            Armv8.8-A and Armv9.3-AMichael S2021/09/18 03:58 PM
              microbenchmark resultsMichael S2021/09/19 04:46 PM
                microbenchmark source codeMichael S2021/09/19 04:58 PM
                  microbenchmark source code-.-2021/09/20 04:49 PM
                    microbenchmark source codeMichael S2021/09/21 10:17 AM
                      microbenchmark source code-.-2021/09/21 04:33 PM
                        microbenchmark source codeMichael S2021/09/21 06:05 PM
                microbenchmark resultsAnon2021/09/19 05:32 PM
                  microbenchmark resultsJörn Engel2021/09/19 08:46 PM
                    microbenchmark resultsdmcq2021/09/20 02:19 AM
                      microbenchmark resultsMichael S2021/09/20 05:12 AM
                      microbenchmark results-.-2021/09/20 04:44 PM
                        microbenchmark resultsMichael S2021/09/21 10:23 AM
                          microbenchmark results-.-2021/09/21 04:35 PM
                            microbenchmark resultsAndrey2021/09/21 05:25 PM
                              I agree (NT)Michael S2021/09/21 06:07 PM
                              microbenchmark results-.-2021/09/22 05:56 PM
                                microbenchmark resultsMichael S2021/09/23 06:11 AM
                                  microbenchmark resultsdmcq2021/09/23 07:53 AM
                                  microbenchmark resultsAndrey2021/09/23 10:20 AM
                                microbenchmark resultsAndrey2021/09/23 10:11 AM
                                  microbenchmark results-.-2021/09/23 08:01 PM
                                    microbenchmark resultsSimon Farnsworth2021/09/24 02:47 AM
                                      microbenchmark results-.-2021/09/24 06:00 PM
                                    microbenchmark resultsAndrey2021/09/24 08:29 AM
                                      microbenchmark resultsdmcq2021/09/24 01:05 PM
                                        microbenchmark resultsDoug S2021/09/24 02:12 PM
                                          microbenchmark results---2021/09/24 07:06 PM
                                            microbenchmark resultsDoug S2021/09/24 11:46 PM
                                              microbenchmark results---2021/09/25 09:56 AM
                                                microbenchmark resultsJukka Larja2021/09/26 02:01 AM
                                                microbenchmark resultsDoug S2021/09/26 09:41 AM
                                                  microbenchmark resultsdmcq2021/09/26 01:37 PM
                                                    microbenchmark resultsDoug S2021/09/27 10:32 AM
                                                      microbenchmark resultsdmcq2021/09/28 07:56 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:49 PM
                                                microbenchmark resultsBrett2021/09/25 03:31 PM
                                              microbenchmark resultsdmcq2021/09/25 12:51 PM
                                                microbenchmark resultsDoug S2021/09/26 09:45 AM
                                            microbenchmark resultsRichard S2021/09/25 01:51 AM
                                              microbenchmark resultsDummond D. Slow2021/09/25 12:52 PM
                                                microbenchmark results---2021/09/25 03:04 PM
                                      SVE alignment with non power-of-2 widths-.-2021/09/24 06:10 PM
                                        SVE alignment with non power-of-2 widthsAndrey2021/09/25 04:46 AM
                                          SVE alignment with non power-of-2 widths-.-2021/09/25 05:35 PM
                                          SVE alignment with non power-of-2 widthsKevin G2021/09/27 09:46 AM
                                            SVE alignment with non power-of-2 widths-.-2021/09/27 09:06 PM
                                              SVE alignment with non power-of-2 widthsJukka Larja2021/09/28 06:37 AM
                                                SVE alignment with non power-of-2 widthsAndrey2021/09/28 12:12 PM
                                                  SVE alignment with non power-of-2 widthsdmcq2021/09/28 02:29 PM
                                                SVE alignment with non power-of-2 widths-.-2021/09/28 06:37 PM
                                                  SVE alignment with non power-of-2 widthsJukka Larja2021/09/29 06:50 AM
                    microbenchmark results---2021/09/20 07:11 AM
                    microbenchmark resultsJörn Engel2021/09/23 05:10 AM
                      microbenchmark resultsMichael S2021/09/23 05:55 AM
                        microbenchmark resultsJörn Engel2021/09/23 09:24 AM
                          microbenchmark resultsRoyi2021/09/26 04:25 PM
                      microbenchmark resultsdmcq2021/09/23 10:42 AM
                        microbenchmark results---2021/09/23 11:53 AM
                      microbenchmark resultsanon22021/09/23 02:40 PM
                microbenchmark results: Zen 3Adrian2021/09/22 01:57 AM
                  microbenchmark results: Zen 3Adrian2021/09/22 02:08 AM
                    microbenchmark results: Zen 3Michael S2021/09/22 05:48 AM
                      microbenchmark results: Zen 3Adrian2021/09/22 06:05 AM
        Armv8.8-A and Armv9.3-AKonrad Schwarz2021/09/28 05:45 AM
    Armv8.8-A and Armv9.3-ALinus Torvalds2021/09/17 08:59 AM
      Armv8.8-A and Armv9.3-ADoug S2021/09/17 11:35 AM
        Armv8.8-A and Armv9.3-Anksingh2021/09/17 12:23 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/17 02:35 PM
            Armv8.8-A and Armv9.3-AKonrad Schwarz2021/10/15 06:23 AM
              Armv8.8-A and Armv9.3-Arwessel2021/10/15 06:49 AM
        Armv8.8-A and Armv9.3-AAdrian2021/09/17 11:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/09/18 07:34 AM
            Armv8.8-A and Armv9.3-AAdrian2021/09/18 07:38 AM
      Armv8.8-A and Armv9.3-Ablaine2021/09/18 10:37 AM
      Armv8.8-A and Armv9.3-ABrett2021/09/19 01:06 PM
        Armv8.8-A and Armv9.3-Admcq2021/09/19 01:36 PM
        Armv8.8-A and Armv9.3-ADoug S2021/09/19 06:07 PM
          Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 09:54 AM
            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/28 01:57 PM
              Armv8.8-A and Armv9.3-A - movesdmcq2021/09/28 02:21 PM
                Armv8.8-A and Armv9.3-A - movesNoSpammer2021/09/29 03:53 AM
                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:55 AM
                    Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 07:53 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:35 AM
                        Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 01:44 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 01:58 PM
                            Armv8.8-A and Armv9.3-A - movesdmcq2021/09/29 03:52 PM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 06:36 PM
                              Armv8.8-A and Armv9.3-A - movesAndrey2021/09/29 07:58 PM
                    Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:10 AM
                      Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:30 AM
                        Armv8.8-A and Armv9.3-A - movesDoug S2021/09/29 10:02 PM
                          Armv8.8-A and Armv9.3-A - movesrwessel2021/09/29 11:22 PM
                            Armv8.8-A and Armv9.3-A - movesMark Roulo2021/09/30 07:37 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 08:02 AM
                                Did they publish a full description? (NT)Michael S2021/09/30 08:12 AM
                                  Did they publish a full description?rwessel2021/09/30 09:18 AM
                                    Did they publish a full description?Michael S2021/09/30 10:24 AM
                                      Did they publish a full description?rwessel2021/09/30 10:42 AM
                                    Did they publish a full description?Adrian2021/10/01 12:22 AM
                                  Do we even okiw it's three instructions per move?Carson2021/09/30 10:28 PM
                                    Do we even okiw it's three instructions per move?Adrian2021/10/01 12:27 AM
                                    Do we even okiw it's three instructions per move?rwessel2021/10/01 04:19 AM
                            Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 09:48 AM
                              Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 10:39 AM
                                Armv8.8-A and Armv9.3-A - movesDoug S2021/09/30 02:56 PM
                                  Armv8.8-A and Armv9.3-A - movesrwessel2021/09/30 05:20 PM
                                    Armv8.8-A and Armv9.3-A - movesdmcq2021/10/01 04:38 AM
                                      Armv8.8-A and Armv9.3-A - movesMichael S2021/10/01 05:04 AM
                                        Armv8.8-A and Armv9.3-A - movesLinus Torvalds2021/10/01 11:01 AM
                                          memcpy - instruction cracking vs DMArpg2021/10/02 02:51 AM
                                            memcpy - instruction cracking vs DMAAdrian2021/10/02 03:45 AM
                                              memcpy - instruction cracking vs DMADoug S2021/10/02 09:47 AM
                                                memcpy - instruction cracking vs DMAAdrian2021/10/02 10:15 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/02 11:37 AM
                                                  memcpy - instruction cracking vs DMADoug S2021/10/02 06:49 PM
                                            memcpy - instruction cracking vs DMALinus Torvalds2021/10/02 10:43 AM
                                              memcpy - instruction cracking vs DMAdmcq2021/10/02 11:32 AM
                                              memcpy - instruction cracking vs DMABrett2021/10/02 11:45 AM
                                              memcpy - instruction cracking vs DMA---2021/10/02 03:03 PM
                                                memcpy - instruction cracking vs DMA---2021/10/02 03:12 PM
                                                  Moving copy to DRAM doesn't help for small copiesMark Roulo2021/10/02 03:59 PM
                                                    Moving copy to DRAM doesn't help for small copies---2021/10/02 07:32 PM
                                                      Moving copy to DRAM doesn't help for small copiesMichael S2021/10/03 01:40 AM
                                                        Moving copy to DRAM doesn't help for small copiesDoug S2021/10/03 10:09 AM
                                                          Moving copy to DRAM doesn't help for small copiesrwessel2021/10/03 10:51 AM
                                                          Moving copy to DRAM doesn't help for small copiesLinus Torvalds2021/10/03 11:09 AM
                                                            How about environments such as Java?Mark Roulo2021/10/03 12:41 PM
                                                              How about environments such as Java?rwessel2021/10/03 12:49 PM
                                                                How about environments such as Java?Mark Roulo2021/10/03 01:22 PM
                                                              How about environments such as Java?anon22021/10/03 07:58 PM
                                                                How about environments such as Java?Etienne Lorrain2021/10/04 05:08 AM
                                                                  Apart from "It depends" there is no short answer. (NT)Michael S2021/10/04 05:30 AM
                                                                  How about environments such as Java?Andrey2021/10/04 06:04 AM
                                                                  How about environments such as Java?anon22021/10/04 06:32 AM
                                                                How about environments such as Java?Mark Roulo2021/10/04 07:31 AM
                                                                How about environments such as Java?---2021/10/04 09:41 AM
                                                                  How about environments such as Java?Doug S2021/10/04 10:23 AM
                                                                    How about environments such as Java?Andrey2021/10/04 12:14 PM
                                                                      How about environments such as Java?Doug S2021/10/04 01:20 PM
                                                                  How about environments such as Java?anon22021/10/04 02:23 PM
                                                                  How about environments such as Java?rwessel2021/10/04 04:54 PM
                                                            Moving copy to DRAM doesn't help for small copiesJörn Engel2021/10/04 05:52 AM
                                                            Early software zeroing !=== early hardware zeroingPaul A. Clayton2021/10/05 11:19 AM
                                                              Early software zeroing !=== early hardware zeroingDoug S2021/10/05 12:21 PM
                                                memcpy - instruction cracking vs DMABrendan2021/10/02 04:53 PM
                                                  memcpy - instruction cracking vs DMALinus Torvalds2021/10/03 10:48 AM
                                                    memcpy - instruction cracking vs DMAdmcq2021/10/03 01:54 PM
                                              memcpy - instruction cracking vs DMAYuhong Bao2021/10/03 01:30 AM
                                                memcpy - instruction cracking vs DMADavid Hess2021/10/05 05:19 PM
                                                  memcpy - instruction cracking vs DMAAdrian2021/10/05 11:28 PM
                                                    memcpy - instruction cracking vs DMAEtienne Lorrain2021/10/06 02:24 AM
                                                    memcpy - instruction cracking vs DMArwessel2021/10/06 03:38 AM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 04:04 AM
                                                        memcpy - instruction cracking vs DMArwessel2021/10/06 05:59 AM
                                                    memcpy - instruction cracking vs DMA---2021/10/06 09:07 AM
                                                      memcpy - instruction cracking vs DMAAndrey2021/10/06 02:59 PM
                                                    memcpy - instruction cracking vs DMAgallier22021/10/06 11:06 PM
                                                      memcpy - instruction cracking vs DMAAdrian2021/10/06 11:59 PM
                                              memcpy - instruction cracking vs DMAMichael S2021/10/03 01:51 AM
                                                memcpy - instruction cracking vs DMArwessel2021/10/03 05:06 AM
                                                  memcpy - instruction cracking vs DMAMichael S2021/10/03 05:24 AM
                                                    memcpy - instruction cracking vs DMAMatt Sayler2021/10/03 08:02 AM
                                                    memcpy - instruction cracking vs DMADoug S2021/10/03 10:14 AM
                                      Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 05:10 AM
                                        Armv8.8-A and Armv9.3-A - movesEtienne Lorrain2021/10/01 07:55 AM
                                          Armv8.8-A and Armv9.3-A - movesrwessel2021/10/01 08:14 AM
                                            Armv8.8-A and Armv9.3-A - movesDoug S2021/10/01 11:17 AM
                                              Armv8.8-A and Armv9.3-A - movesrwessel2021/10/02 04:57 AM
  Armv8.8-A and Armv9.3-Anone2021/10/13 06:06 AM
    Armv8.8-A and Armv9.3-AAdrian2021/10/13 06:22 AM
      Armv8.8-A and Armv9.3-ADoug S2021/10/13 09:01 AM
        Armv8.8-A and Armv9.3-Admcq2021/10/13 10:17 AM
          Armv8.8-A and Armv9.3-Anone2021/10/13 10:26 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/14 08:22 AM
    Armv8.8-A and Armv9.3-Arwessel2021/10/14 09:01 AM
      Armv8.8-A and Armv9.3-AAnon2021/10/14 11:08 AM
        Armv8.8-A and Armv9.3-AMichael S2021/10/14 01:25 PM
      Armv8.8-A and Armv9.3-ADoug S2021/10/14 11:18 AM
        Armv8.8-A and Armv9.3-Arwessel2021/10/14 07:07 PM
          Armv8.8-A and Armv9.3-ADoug S2021/10/14 10:23 PM
            Armv8.8-A and Armv9.3-Admcq2021/10/15 01:41 AM
              Armv8.8-A and Armv9.3-AGabriele Svelto2021/10/15 05:07 AM
            Armv8.8-A and Armv9.3-Arwessel2021/10/15 04:49 AM
              Armv8.8-A and Armv9.3-ADoug S2021/10/15 10:44 AM
                Armv8.8-A and Armv9.3-Ame2021/10/15 06:34 PM
                  Armv8.8-A and Armv9.3-ADoug S2021/10/16 09:47 AM
                    Armv8.8-A and Armv9.3-Ame2021/10/17 05:19 AM
                      Armv8.8-A and Armv9.3-ADoug S2021/10/17 10:17 AM
                        Armv8.8-A and Armv9.3-Ame2021/10/17 12:31 PM
                          Armv8.8-A and Armv9.3-ADoug S2021/10/17 01:33 PM
                            Armv8.8-A and Armv9.3-AzArchJon2021/10/18 10:35 AM
                              Armv8.8-A and Armv9.3-ADoug S2021/10/18 02:35 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊